Tag
CAT introduces a framework that leverages model self-certainty signals to autonomously adjust reasoning length based on problem difficulty, reducing overthinking and improving inference efficiency for large reasoning models.
This paper introduces DASH, a method that uses intermediate answer commitments within reasoning traces to assign segment-level credit, reducing overthinking behaviors and improving accuracy on competition-level math benchmarks.
A fine-tuned version of Qwen3.5-MoE called NEX-N2-mini reportedly fixes overthinking issues seen in Qwen 3.5 and 3.6 models.
This paper introduces Dynamic Rollout Editing (DRE), a training-time intervention to reduce overthinking in GRPO-style reinforcement learning for reasoning models. DRE edits successful trajectories by preserving the solution-reachable prefix and preferring verified shorter edits, weakening the preference for unnecessary thinking.
ASAG uses attention entropy to detect when reasoning is unproductive, stopping early to improve accuracy and reduce token generation. Experiments on Qwen3-8B show a 4.4% accuracy gain and over 40% fewer generated tokens.
This paper introduces DyCon, a training-free framework that uses step-level embeddings to model evolving task difficulty and dynamically control reasoning depth in Large Reasoning Models, effectively reducing overthinking and improving efficiency without sacrificing accuracy.
This paper introduces a prefix-level trajectory evaluation protocol to distinguish harmful overthinking from verbose but harmless overthinking in large reasoning models, showing that continued reasoning after reaching the correct answer can destabilize performance. The authors find that early stopping improves accuracy by up to 21% on multimodal benchmarks, and identify logical drift and visual reinterpretation as key causes of correctness deviations.
This paper reveals that aggressive post-training quantization of reasoning models leads to increased overthinking errors, where models reach correct intermediate answers but fail to finalize them. A simple logit penalty on overthinking markers reduces chain-of-thought length by 12-23% while improving accuracy, especially for quantized models.
SLAT is a segment-level adaptive trimming framework for chain-of-thought reasoning that reduces reasoning length by 50% while maintaining accuracy by suppressing redundant segments.
A personal project led to an ACL 2026 paper introducing TIME, a method training Qwen3 models to engage in short, context-triggered thinking rather than excessive reasoning. The work uses QLoRA and a four-phase curriculum, with all data and code released open-source.
This paper introduces the VPG-EA framework, which uses variational inference and posterior guidance to improve the reasoning efficiency of large language models by addressing the 'overthinking' phenomenon in chain-of-thought generation.
This paper introduces Implicit Compression Regularization (ICR), a method to address LLM overthinking during RL post-training by guiding models toward concise yet accurate reasoning trajectories.