Stop When Further Reasoning Won't Help: Attention-State Adaptive Generation in Reasoning Models

arXiv cs.CL 06/16/26, 04:00 AM Papers

reasoning-models attention-mechanism early-stopping token-efficiency chain-of-thought deepseek qwen

Summary

This paper proposes ASAG, a training-free method that adaptively stops reasoning in large reasoning models based on attention distributions, reducing token usage by ~40% while improving accuracy by 3.2% on benchmarks using DeepSeek-R1-Distill and Qwen3 models.

arXiv:2606.15070v1 Announce Type: new Abstract: By incorporating test-time compute scaling, large reasoning models (LRMs) can solve complex problems through explicit chain-of-thought (CoT) reasoning processes. However, they often suffer from overthinking, resulting in redundant token outputs and degraded accuracy. Current methods to mitigate this issue remain limited: training-based approaches require substantial computational resources, while training-free methods rely on well-crafted prompts or unreliable confidence signals. In this work, we investigate early stopping from the perspective of attention distributions and propose a simple method, ASAG, which infers the model's reasoning state and adaptively adjusts the generation strategy. The proposed framework is training-free and plug-and-play, enabling seamless integration into existing LRMs. Extensive experiments on nine benchmarks demonstrate consistent improvements across mainstream LRMs with varying parameter scales, including the DeepSeek-R1-Distill and Qwen3 series. Specifically, ASAG improves average accuracy by 3.2% while reducing the number of generated tokens by nearly 40% across all reasoning tasks on Qwen3-8B.

Original Article

View Cached Full Text

Cached at: 06/16/26, 11:45 AM

# Stop When Further Reasoning Won't Help: Attention-State Adaptive Generation in Reasoning Models
Source: [https://arxiv.org/abs/2606.15070](https://arxiv.org/abs/2606.15070)
[View PDF](https://arxiv.org/pdf/2606.15070)

> Abstract:By incorporating test\-time compute scaling, large reasoning models \(LRMs\) can solve complex problems through explicit chain\-of\-thought \(CoT\) reasoning processes\. However, they often suffer from overthinking, resulting in redundant token outputs and degraded accuracy\. Current methods to mitigate this issue remain limited: training\-based approaches require substantial computational resources, while training\-free methods rely on well\-crafted prompts or unreliable confidence signals\. In this work, we investigate early stopping from the perspective of attention distributions and propose a simple method, ASAG, which infers the model's reasoning state and adaptively adjusts the generation strategy\. The proposed framework is training\-free and plug\-and\-play, enabling seamless integration into existing LRMs\. Extensive experiments on nine benchmarks demonstrate consistent improvements across mainstream LRMs with varying parameter scales, including the DeepSeek\-R1\-Distill and Qwen3 series\. Specifically, ASAG improves average accuracy by 3\.2% while reducing the number of generated tokens by nearly 40% across all reasoning tasks on Qwen3\-8B\.

## Submission history

From: Jiakai Li \[[view email](https://arxiv.org/show-email/65b659c6/2606.15070)\] **\[v1\]**Sat, 13 Jun 2026 02:58:29 UTC \(1,220 KB\)

Stop When Further Reasoning Won't Help: Attention-State Adaptive Generation in Reasoning Models

Similar Articles

@sheriyuo: This paper proposes ASAG, Attention-State Adaptive Generation, a training-free, plug-and-play stopping framework for re…

Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

Reasoning Can Be Restored by Correcting a Few Decision Tokens

ATTNPO: Attention-Guided Process Supervision for Efficient Reasoning

Improving Reasoning Capabilities in Small Models through Mixture-of-Layers Distillation with Stepwise Attention on Key Information

Submit Feedback

Similar Articles

@sheriyuo: This paper proposes ASAG, Attention-State Adaptive Generation, a training-free, plug-and-play stopping framework for re…

Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

Reasoning Can Be Restored by Correcting a Few Decision Tokens

ATTNPO: Attention-Guided Process Supervision for Efficient Reasoning

Improving Reasoning Capabilities in Small Models through Mixture-of-Layers Distillation with Stepwise Attention on Key Information