Tag
Introduces Local Branch Routing (LBR), a token-level test-time scaling framework that expands a local lookahead tree and uses a lightweight router to select the best branch. LBR improves reasoning on mathematical benchmarks over chain-of-thought and other baselines.
ExTra introduces exploratory trajectory optimization for language model reinforcement learning, combining novelty rewards and entropy-guided prefix regeneration to improve both single-sample accuracy and inference-time coverage on mathematical reasoning benchmarks.
This paper introduces LatentMAS, a training-free framework for multi-agent systems that enables large language model agents to collaborate directly in continuous latent space via shared latent working memory, achieving up to 14.6% higher accuracy and 4x faster inference while reducing token usage by over 70%.
The post discusses the confirmed research finding that verifier-based test-time compute scaling dominates verifier-free methods, with practical examples like Apodex showing gains from separate verification processes. It argues that building independent verifiers is a key path for future AI capability improvements.
OpenAI announced new advanced AI models with improved reasoning, coding, and research capabilities, capable of handling complex tasks with better accuracy, potentially impacting multiple industries.
This paper introduces blockwise policy-drift gating, a lightweight method to improve on-policy distillation for language models by weighting loss based on old-current student probability shifts, achieving improved reasoning accuracy on math benchmarks.
The paper introduces the Latent Bridge, a trainable continuous channel that couples a slow reasoning VLM (Qwen3-VL-8B-Thinking) and a fast reactive VLM (MiniCPM-o 4.5) for real-time game agents. Experiments on Atari games and MetaDrive show it matches or outperforms the text-based bridge while avoiding destructive interference when used alone.
This paper studies rational closure for the DL-Lite family of description logics, providing a plug-in architecture for efficient non-monotonic reasoning and conjunctive query answering with minimal computational overhead.
Introduces Neuro-Symbolic Drive, a framework that uses rule-grounded reasoning traces from classical planners to fine-tune a driving VLA (Qwen3.5-4B), achieving significant reductions in average displacement error and miss rate compared to standard CoT reasoning.
Microsoft's NextLat paper proposes a self-supervised training method where transformers predict their next hidden state instead of just the next token, leading to more compact world models, better planning and reasoning, and up to 3.3x faster generation.
New paper proposes self-compacting language model agents that can decide when to clean up their own traces of reasoning and tool calls to avoid accumulating mistakes and stale information.
A reflective question on why people still struggle with AI despite its ability to simplify many tasks, inviting perspectives on the psychological and practical barriers to adoption.
This technical report introduces VibeThinker-3B, a 3B parameter dense model that achieves frontier-level reasoning performance on benchmarks like AIME26 and LiveCodeBench, matching or exceeding much larger models such as DeepSeek V3.2 and GLM-5 through a combination of curriculum-based SFT, multi-domain RL, and offline self-distillation.
Daniel Han is hosting a 3-hour advanced workshop at the AI Engineer World's Fair, sharing insights on the history of open-source large models, classification of training stages (pre-training, intermediate training, supervised fine-tuning, post-training, reinforcement fine-tuning), and the leap in reasoning models. He also introduced his team's open-source contributions to fine-tuning optimization.
This paper investigates whether LLM agents can infer hidden world models through interaction, finding that they struggle to build stable internal models as complexity increases.
ChartWalker introduces a novel framework for cross-chart retrieval-augmented generation (RAG) using hierarchical knowledge graph construction and structure-aware sampling. It releases a challenging benchmark (ChartWalker-Bench) and an agentic baseline (ChartWalker-Agent), revealing significant performance gaps in current RAG paradigms.
A speculative discussion questioning why LLMs are not trained to think in an optimized internal language rather than natural language, and whether that could improve efficiency.
This paper systematically evaluates multimodal Chain-of-Thought reasoning across 12 tasks, finding it selectively effective for reasoning tasks but detrimental for perception tasks, and identifies a 'Look Light, Think Heavy' pattern where visual introspection declines during reasoning.
CombEval is a dynamic benchmark for evaluating combinatorial counting in large language models, using typed specifications to generate problems with solver-verified answers. It tests 11 LLMs under direct and code-augmented settings and finds brittleness on ordered objects, indistinguishable elements, relative constraints, and nested dependencies.
This paper models multi-agent LLM deliberation as a closed-loop dynamical system where each agent has a hidden internal belief (anchor) that continually pulls its opinion, and shows how this anchor can be recovered from deliberation data alone, explaining phenomena like opinions escaping the convex hull of initial beliefs.