math-benchmarks

#math-benchmarks

Know When to Stop: Segment-Level Credit Assignment for Reducing Overthinking

arXiv cs.CL ↗ · yesterday Cached

This paper introduces DASH, a method that uses intermediate answer commitments within reasoning traces to assign segment-level credit, reducing overthinking behaviors and improving accuracy on competition-level math benchmarks.

0 favorites 0 likes

#math-benchmarks

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

Hugging Face Daily Papers ↗ · 2026-06-17 Cached

This paper proposes Trajectory-Augmented Policy Optimization (TAPO), which constructs micro-reflective correction trajectories using the model's own correct and incorrect rollouts to improve reasoning in large language models, outperforming standard self-distillation methods on math benchmarks.

0 favorites 0 likes

#math-benchmarks

How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning

arXiv cs.AI ↗ · 2026-05-26 Cached

This paper formalizes reasoning redundancy in LLMs as the fraction of trailing steps that can be truncated without affecting correctness, quantifying 61-93% redundancy across frontier models and proving that redundancy is a structural consequence of length-agnostic outcome rewards.

0 favorites 0 likes

math-benchmarks

Know When to Stop: Segment-Level Credit Assignment for Reducing Overthinking

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning

Submit Feedback