Stop When Further Reasoning Won't Help: Attention-State Adaptive Generation in Reasoning Models
Summary
This paper proposes ASAG, a training-free method that adaptively stops reasoning in large reasoning models based on attention distributions, reducing token usage by ~40% while improving accuracy by 3.2% on benchmarks using DeepSeek-R1-Distill and Qwen3 models.
View Cached Full Text
Cached at: 06/16/26, 11:45 AM
# Stop When Further Reasoning Won't Help: Attention-State Adaptive Generation in Reasoning Models Source: [https://arxiv.org/abs/2606.15070](https://arxiv.org/abs/2606.15070) [View PDF](https://arxiv.org/pdf/2606.15070) > Abstract:By incorporating test\-time compute scaling, large reasoning models \(LRMs\) can solve complex problems through explicit chain\-of\-thought \(CoT\) reasoning processes\. However, they often suffer from overthinking, resulting in redundant token outputs and degraded accuracy\. Current methods to mitigate this issue remain limited: training\-based approaches require substantial computational resources, while training\-free methods rely on well\-crafted prompts or unreliable confidence signals\. In this work, we investigate early stopping from the perspective of attention distributions and propose a simple method, ASAG, which infers the model's reasoning state and adaptively adjusts the generation strategy\. The proposed framework is training\-free and plug\-and\-play, enabling seamless integration into existing LRMs\. Extensive experiments on nine benchmarks demonstrate consistent improvements across mainstream LRMs with varying parameter scales, including the DeepSeek\-R1\-Distill and Qwen3 series\. Specifically, ASAG improves average accuracy by 3\.2% while reducing the number of generated tokens by nearly 40% across all reasoning tasks on Qwen3\-8B\. ## Submission history From: Jiakai Li \[[view email](https://arxiv.org/show-email/65b659c6/2606.15070)\] **\[v1\]**Sat, 13 Jun 2026 02:58:29 UTC \(1,220 KB\)
Similar Articles
@sheriyuo: This paper proposes ASAG, Attention-State Adaptive Generation, a training-free, plug-and-play stopping framework for re…
ASAG uses attention entropy to detect when reasoning is unproductive, stopping early to improve accuracy and reduce token generation. Experiments on Qwen3-8B show a 4.4% accuracy gain and over 40% fewer generated tokens.
Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models
This paper introduces PUMA, a plug-and-play framework that detects semantic redundancy in chain-of-thought reasoning to enable early exit, achieving 26.2% average token reduction across multiple models and benchmarks while preserving accuracy and reasoning quality.
Reasoning Can Be Restored by Correcting a Few Decision Tokens
This paper shows that the reasoning gap between base LLMs and large reasoning models is concentrated on a small set of early planning tokens. It introduces disagreement-guided token intervention, where replacing only those critical tokens with a reasoning model's outputs allows a base model to nearly match the reasoning model's performance.
ATTNPO: Attention-Guided Process Supervision for Efficient Reasoning
ATTNPO introduces an attention-guided process supervision framework that reduces overthinking in large reasoning models by leveraging intrinsic attention signals for step-level credit assignment, achieving improved performance with shorter reasoning lengths across 9 benchmarks.
Improving Reasoning Capabilities in Small Models through Mixture-of-Layers Distillation with Stepwise Attention on Key Information
This paper proposes a novel Chain-of-Thought distillation framework that transfers teacher models' stepwise attention on key information to student models through a Mixture-of-Layers module for dynamic layer alignment. The method achieves consistent performance improvements on mathematical and commonsense reasoning benchmarks by explicitly guiding student models to progressively focus on critical information during reasoning.