Introduces Neural Particle Automata, a method for learning self-organizing particle dynamics using smooth particle hydrodynamics perception, enabling particles to have local perception vectors for an update rule, analogous to Neural Cellular Automata but on continuous particle positions.
An AI agent playing Civilization VI builds a nuclear weapon to stop an impending cultural defeat, but still loses the game. The article explores the limitations of current AI benchmarks for government decision-making and argues that strategic game environments better test AI's ability to handle complexity and uncertainty.
This post reports an observation that reading a long, structured text before answering alters a model's later responses, with behavioral evidence from Claude and mechanistic analysis on open-weight Gemma models showing separable hidden states and sharper probability distributions in instruction-tuned variants.
The article reports a potential alignment vulnerability in LLMs where processing a structured passage before an unrelated question can alter the model's response, with mechanistic evidence from Gemma-3-12B showing hidden-state separation.
This technical report introduces VibeThinker-3B, a 3B parameter dense model that achieves frontier-level reasoning performance on benchmarks like AIME26 and LiveCodeBench, matching or exceeding much larger models such as DeepSeek V3.2 and GLM-5 through a combination of curriculum-based SFT, multi-domain RL, and offline self-distillation.
This paper proposes a thermodynamic measure of intelligence defined as 'rare-valid lift' and argues that recursive self-simulation is necessary and nearly sufficient for high thermodynamic intelligence, making intelligence measurable on a universal scale.
Research paper shows that LLMs suffer from 'role confusion', where they prioritize the style of text over its actual role tags, enabling prompt injection attacks. Destyling text reduces attack success from 61% to 10%, indicating a fundamental challenge for LLM security.
An international research team identified the source of a mysterious repeating radio signal as a white dwarf pulling material from a companion red dwarf, solving a long-standing astronomical puzzle.
This paper presents a theory that prompt injection attacks on LLMs stem from a fundamental flaw in how models perceive roles, treating roles as a type system for language. It explains existing attacks, predicts new ones, and proposes a research agenda for a science of roles.
A reflection on the landmark 'Attention Is All You Need' paper, highlighting how removing recurrence and relying solely on attention mechanisms revolutionized AI and led to modern LLMs like GPT and Claude.
This paper revises the estimated proportion of newly written code that is generated or reviewed by AI, analyzing its impact on software development.
This article explores how AI agents can automatically write and optimize their skill files using techniques like SkillOpt from Microsoft Research, which treats skill documents as trainable state and delivers significant performance improvements. It addresses the challenge of manual skill tuning and presents frameworks like GEPA and EvoSkill as evolutionary approaches.
A detailed blog post explaining the Sakana Fugu technical report, which introduces orchestrator AI models that route tasks to specialized models, achieving collective intelligence.
Computational complexity theorists argue that semiclassical gravity's non-linear dynamics would enable impossibly powerful computation, proving gravity must be quantized. The paper uses the Schrödinger-Newton equation to show that classical gravity coupled to quantum matter leads to computational contradictions.
NVIDIA's ENPIRE framework, developed with CMU and UC Berkeley, uses AI coding agents to autonomously train robots for high-precision physical tasks like GPU installation, achieving a 99% success rate through a closed feedback loop and real hardware trials.
A new Nature paper from the Pakistan Genomic Resource (PGR) analyzes 173,303 Pakistanis from consanguineous communities, identifying human knockouts for nearly one-third of protein-coding genes, overturning biological assumptions like PRDM9 essentiality for fertility.
This paper investigates whether LLM agents can infer hidden world models through interaction, finding that they struggle to build stable internal models as complexity increases.
Introduces Test-Time Reinforcement Learning (TTRL), a method that uses majority voting on unlabeled data to create pseudo-labels for RL training, enabling self-improvement of LLMs without ground-truth answers. Achieves significant gains (e.g., +159-211% on AIME 2024 for Qwen-2.5-Math-7B).
This blog post analyzes the PivCo-Huffman paper, which introduces 'merge' operations for parallel Huffman decoding, enabling efficient vectorized and GPU-friendly decoding without interleaving overhead.
Arbor introduces explicit geometric control for 3D asset generation by using constraint meshes (hull, avoidance, touch regions) to condition latent generation, improving spatial constraint adherence without sacrificing object quality.