llm-reasoning

#llm-reasoning

Beyond Trajectory Imitation: Strategy-Guided Policy Optimization for LLM Reasoning

arXiv cs.AI ↗ · yesterday Cached

Introduces Strategy-Guided Policy Optimization (SGPO) for LLM reasoning, which replaces trajectory imitation with strategy distillation, improving generalization on math benchmarks.

0 favorites 0 likes

#llm-reasoning

@Gracker_Gao: AI Papers: Strong AI Doesn't Write Code by Writing Code Two recent arXiv papers reveal a counterintuitive finding: when encountering an unfamiliar programming language, GPT-5.4 and Claude Opus 4.6 don't directly write code in the target language—instead, they write a Python program to generate the target code, then debug it locally. This "meta-…

X AI KOLs Timeline ↗ · 2d ago Cached

Two recent arXiv papers found that GPT-5.4 and Claude Opus 4.6 employ a metaprogramming strategy when handling unfamiliar programming languages — generating target code with Python and debugging locally — rather than writing the target language code directly. This strategy is key to distinguishing top-tier agents from average ones, and strategy sophistication matters more than model parameter scale.

0 favorites 0 likes

#llm-reasoning

ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation

Hugging Face Daily Papers ↗ · 3d ago Cached

ReNIO enhances on-policy distillation for LLMs by reweighting negative trajectories based on token-level probability ratios, improving reasoning performance in mathematical and code generation tasks.

0 favorites 0 likes

#llm-reasoning

@rao2z: "When an LLM outputs a step-by-step plan, it creates a powerful illusion that you are watching a machine reason its way…

X AI KOLs Following ↗ · 4d ago Cached

A position paper by Subbarao Kambhampati and researchers at Arizona State University argues that chain-of-thought reasoning in LLMs creates an illusion of reasoning, and the industry needs to move beyond costly token generation to alternative reasoning mechanisms.

0 favorites 0 likes

#llm-reasoning

Beyond Entropy: Learning from Token-Level Distributional Deviations for LLM Reasoning

arXiv cs.AI ↗ · 5d ago Cached

Introduces Independent Combinatorial Tokens (ICT) framework that uses Jensen-Shannon divergence between token logit distributions to identify critical branching points, preventing entropy collapse and explosion in RLVR for LLM reasoning. Achieves up to 14.9% pass@4 improvement on Qwen models.

0 favorites 0 likes

#llm-reasoning

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

Hugging Face Daily Papers ↗ · 2026-06-17 Cached

This paper proposes Trajectory-Augmented Policy Optimization (TAPO), which constructs micro-reflective correction trajectories using the model's own correct and incorrect rollouts to improve reasoning in large language models, outperforming standard self-distillation methods on math benchmarks.

0 favorites 0 likes

#llm-reasoning

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

Hugging Face Daily Papers ↗ · 2026-06-17 Cached

Proposes REVES, a two-stage iterative framework that alternates between data augmentation and policy optimization to improve LLM reasoning by leveraging intermediate correction steps, achieving superior performance on coding benchmarks and constraint satisfaction problems.

0 favorites 0 likes

#llm-reasoning

CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning

arXiv cs.CL ↗ · 2026-06-16 Cached

This paper introduces CoRA, a GRPO-based reinforcement learning framework that aligns LLM confidence with generated rationales to improve the reliability of chain-of-thought reasoning, achieving up to 26.51% reduction in misalignment error across multiple benchmarks.

0 favorites 0 likes

#llm-reasoning

Numbers Already Carry Their Own Embeddings

arXiv cs.LG ↗ · 2026-06-15 Cached

Introduces Adelic operation-preserved embeddings (AOE), a training-free representation that encodes numbers by combining real value with p-adic expansions, preserving additive and multiplicative structure. Achieves perfect accuracy on the Weaving Pattern benchmark.

0 favorites 0 likes

#llm-reasoning

Mental-R1: Aligning LLM Reasoning for Mental Health Assessment

arXiv cs.AI ↗ · 2026-06-12 Cached

Proposes Cognitive Relative Policy Optimization (CRPO), a reinforcement learning framework for aligning LLM reasoning in mental health assessment, achieving an average improvement of 10.4 percentage points in weighted F1-score over existing baselines.

0 favorites 0 likes

#llm-reasoning

MARS: Margin-Adversarial Risk-controlled Stopping for Parallel LLM Test-time Scaling

arXiv cs.AI ↗ · 2026-06-12 Cached

This paper introduces MARS, a stopping rule for parallel LLM test-time scaling that probes partial traces to stop early without sacrificing accuracy, saving 25–47% of tokens across reasoning models on competition math benchmarks.

0 favorites 0 likes

#llm-reasoning

Mind the Perspective: Let's Reason Recursively for Theory of Mind

arXiv cs.AI ↗ · 2026-06-11 Cached

Introducing RecToM, an inference-time framework that models nested beliefs via recursive perspective construction for Theory of Mind reasoning in LLMs, achieving state-of-the-art performance on multiple benchmarks.

0 favorites 0 likes

#llm-reasoning

Sample Where You Struggle: Sharpening Base Model Reasoning via Entropy-Guided Power Sampling

arXiv cs.LG ↗ · 2026-06-10 Cached

This paper introduces Entropy-Guided Power Sampling (EGPS), a training-free and verifier-free sampler that improves the efficiency of power sampling for enhancing base language model reasoning. EGPS achieves up to 12.6x speedup over standard Metropolis-Hastings sampling while reaching best or tied-best accuracy on benchmarks like MATH500, HumanEval, and GPQA.

0 favorites 0 likes

#llm-reasoning

Early-Token Confidence Predicts Reasoning Quality in Multi-Agent LLM Debate

arXiv cs.CL ↗ · 2026-06-10 Cached

This paper investigates whether early-token confidence signals from LLM decoding can predict reasoning quality in multi-agent debate systems, finding that confidence in the first few generated tokens is the strongest predictor of rubric-based essay scores.

0 favorites 0 likes

#llm-reasoning

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

Hugging Face Daily Papers ↗ · 2026-06-09 Cached

TRACE is a unified rollout budget allocation framework that enhances reward contrast in multi-turn agentic reinforcement learning by dynamically distributing resources across tree-structured rollouts based on prefix-level informativeness. It improves efficiency and accuracy on agentic benchmarks like Multi-Hop QA.

0 favorites 0 likes

#llm-reasoning

From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning

arXiv cs.CL ↗ · 2026-06-08 Cached

This paper introduces Prefix Utility Model (PUM), which evaluates LLM reasoning prefixes based on their utility (improvement in solve rate) rather than local correctness. PUM shows strong performance in mathematical reasoning tasks across selection, search, and reinforcement learning.

0 favorites 0 likes

#llm-reasoning

ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning

arXiv cs.CL ↗ · 2026-06-08 Cached

ThinkBooster is a unified framework for test-time compute scaling of LLM reasoning, providing a modular Python library, a performance-efficiency benchmark, an OpenAI-compatible proxy service, and a visual debugger. Empirical results on math and coding tasks demonstrate practical gains with quality-cost trade-offs.

0 favorites 0 likes

#llm-reasoning

Are Large Language Models Suitable for Graph Computation? Progress and Prospects

arXiv cs.CL ↗ · 2026-06-08 Cached

This survey reviews the use of large language models for graph computation, categorizing them into two paradigms: LLMs as executors and LLMs as planners. It finds LLMs promising for simple tasks but unreliable for large-scale exact computations, and suggests future directions.

0 favorites 0 likes

#llm-reasoning

AI agents fail at the auth step more than at the reasoning step. anyone else seeing this?

Reddit r/artificial ↗ · 2026-06-05

AI agents often fail due to authentication hurdles like email verification, OTP timeouts, and captchas, not due to reasoning errors, highlighting infrastructure challenges in production.

0 favorites 0 likes

#llm-reasoning

The strange thing about LLM reasoning research: we're now trying to remove the chain-of-thought traces

Reddit r/artificial ↗ · 2026-06-05

The article discusses a shift in LLM reasoning research from making reasoning explicit via chain-of-thought to exploring latent reasoning that doesn't require language traces, questioning whether visibility is necessary for effective reasoning.

0 favorites 0 likes

llm-reasoning

Submit Feedback