reasoning

#reasoning

Efficient and Trainable Language Model Test-Time Scaling via Local Branch Routing

arXiv cs.CL ↗ · 3h ago Cached

Introduces Local Branch Routing (LBR), a token-level test-time scaling framework that expands a local lookahead tree and uses a lightweight router to select the best branch. LBR improves reasoning on mathematical benchmarks over chain-of-thought and other baselines.

0 favorites 0 likes

#reasoning

ExTra: Exploratory Trajectory Optimization for Language Model Reinforcement Learning

arXiv cs.LG ↗ · 3h ago Cached

ExTra introduces exploratory trajectory optimization for language model reinforcement learning, combining novelty rewards and entropy-guided prefix regeneration to improve both single-sample accuracy and inference-time coverage on mathematical reasoning benchmarks.

0 favorites 0 likes

#reasoning

@FinanceYF5: Paper：

X AI KOLs Following ↗ · 5h ago Cached

This paper introduces LatentMAS, a training-free framework for multi-agent systems that enables large language model agents to collaborate directly in continuous latent space via shared latent working memory, achieving up to 14.6% higher accuracy and 4x faster inference while reducing token usage by over 70%.

0 favorites 0 likes

#reasoning

The verifier based vs verifier free test time scaling result is older than people act, and it keeps getting confirmed [D]

Reddit r/MachineLearning ↗ · 19h ago

The post discusses the confirmed research finding that verifier-based test-time compute scaling dominates verifier-free methods, with practical examples like Apodex showing gains from separate verification processes. It argues that building independent verifiers is a key path for future AI capability improvements.

0 favorites 0 likes

#reasoning

🚀 Open AI Unveils More Advanced AI Models Capable of Longer Reasoning and Better Task Execution

Reddit r/artificial ↗ · 19h ago

OpenAI announced new advanced AI models with improved reasoning, coding, and research capabilities, capable of handling complex tasks with better accuracy, potentially impacting multiple industries.

0 favorites 0 likes

#reasoning

Blockwise Policy-Drift Gating for On-Policy Distillation

arXiv cs.LG ↗ · yesterday Cached

This paper introduces blockwise policy-drift gating, a lightweight method to improve on-policy distillation for language models by weighting loss based on old-current student probability shifts, achieving improved reasoning accuracy on math benchmarks.

0 favorites 0 likes

#reasoning

The Latent Bridge: A Continuous Slow-Fast Channel for Real-Time Game Agents

arXiv cs.AI ↗ · yesterday Cached

The paper introduces the Latent Bridge, a trainable continuous channel that couples a slow reasoning VLM (Qwen3-VL-8B-Thinking) and a fast reactive VLM (MiniCPM-o 4.5) for real-time game agents. Experiments on Atari games and MetaDrive show it matches or outperforms the text-based bridge while avoiding destructive interference when used alone.

0 favorites 0 likes

#reasoning

Tractable Reasoning and Conjunctive Query Answering for Defeasible DL-Lite under Rational Closure

arXiv cs.AI ↗ · yesterday Cached

This paper studies rational closure for the DL-Lite family of description logics, providing a plug-in architecture for efficient non-monotonic reasoning and conjunctive query answering with minimal computational overhead.

0 favorites 0 likes

#reasoning

Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs

arXiv cs.AI ↗ · yesterday Cached

Introduces Neuro-Symbolic Drive, a framework that uses rule-grounded reasoning traces from classical planners to fine-tune a driving VLA (Qwen3.5-4B), achieving significant reductions in average displacement error and miss rate compared to standard CoT reasoning.

0 favorites 0 likes

#reasoning

@rohanpaul_ai: New Microsoft paper argues that transformers generalize better when they learn compact internal states, not just next t…

X AI KOLs Timeline ↗ · yesterday Cached

Microsoft's NextLat paper proposes a self-supervised training method where transformers predict their next hidden state instead of just the next token, leading to more compact world models, better planning and reasoning, and up to 3.3x faster generation.

0 favorites 0 likes

#reasoning

@tli104: New paper: "Self-Compacting Language Model Agents" LM agents build up long traces of reasoning and tool calls. As the t…

X AI KOLs Timeline ↗ · yesterday Cached

New paper proposes self-compacting language model agents that can decide when to clean up their own traces of reasoning and tool calls to avoid accumulating mistakes and stale information.

0 favorites 0 likes

#reasoning

With AI, testing, decision-making, learning, coding, and many other tasks have become much easier. If AI makes so many things easier, then why do people still struggle despite having access to AI?

Reddit r/artificial ↗ · 2d ago

A reflective question on why people still struggle with AI despite its ability to simplify many tasks, inviting perspectives on the psychological and practical barriers to adoption.

0 favorites 0 likes

#reasoning

VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

Hacker News Top ↗ · 2d ago Cached

This technical report introduces VibeThinker-3B, a 3B parameter dense model that achieves frontier-level reasoning performance on benchmarks like AIME26 and LiveCodeBench, matching or exceeding much larger models such as DeepSeek V3.2 and GLM-5 through a combination of curriculum-based SFT, multi-domain RL, and offline self-distillation.

0 favorites 0 likes

#reasoning

@danielhanchen: I’m running a 3 hour advanced workshop at AI Engineer World’s Fair! 2026 has greatly changed how one should learn lower…

X AI KOLs Following ↗ · 2d ago Cached

Daniel Han is hosting a 3-hour advanced workshop at the AI Engineer World's Fair, sharing insights on the history of open-source large models, classification of training stages (pre-training, intermediate training, supervised fine-tuning, post-training, reinforcement fine-tuning), and the leap in reasoning models. He also introduced his team's open-source contributions to fine-tuning optimization.

0 favorites 0 likes

#reasoning

@rohanpaul_ai: Can LLM agents actually discover hidden rules by interacting? The answer is uncomfortable. The more complicated the hid…

X AI KOLs Following ↗ · 3d ago Cached

This paper investigates whether LLM agents can infer hidden world models through interaction, finding that they struggle to build stable internal models as complexity increases.

0 favorites 0 likes

#reasoning

ChartWalker: Benchmarking the Cross-Chart RAG Task

Hugging Face Daily Papers ↗ · 3d ago Cached

ChartWalker introduces a novel framework for cross-chart retrieval-augmented generation (RAG) using hierarchical knowledge graph construction and structure-aware sampling. It releases a challenging benchmark (ChartWalker-Bench) and an agentic baseline (ChartWalker-Agent), revealing significant performance gaps in current RAG paradigms.

0 favorites 0 likes

#reasoning

Why can't LLMs be trained to think in an optimized AI language rather than English?

Reddit r/singularity ↗ · 4d ago

A speculative discussion questioning why LLMs are not trained to think in an optimized internal language rather than natural language, and whether that could improve efficiency.

0 favorites 0 likes

#reasoning

Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do

Hugging Face Daily Papers ↗ · 4d ago Cached

This paper systematically evaluates multimodal Chain-of-Thought reasoning across 12 tasks, finding it selectively effective for reasoning tasks but detrimental for perception tasks, and identifies a 'Look Light, Think Heavy' pattern where visual introspection declines during reasoning.

0 favorites 0 likes

#reasoning

CombEval: A Framework for Evaluating Combinatorial Counting in Large Language Models

arXiv cs.AI ↗ · 5d ago Cached

CombEval is a dynamic benchmark for evaluating combinatorial counting in large language models, using typed specifications to generate problems with solver-verified answers. It tests 11 LLMs under direct and code-augmented settings and finds brittleness on ordered objects, indistinguishable elements, relative constraints, and nested dependencies.

0 favorites 0 likes

#reasoning

Hidden Anchors in Multi-Agent LLM Deliberation

arXiv cs.AI ↗ · 5d ago Cached

This paper models multi-agent LLM deliberation as a closed-loop dynamical system where each agent has a hidden internal belief (anchor) that continually pulls its opinion, and shows how this anchor can be recovered from deliberation data alone, explaining phenomena like opinions escaping the convex hull of initial beliefs.

0 favorites 0 likes

reasoning

Submit Feedback