Stop When Further Reasoning Won't Help: Attention-State Adaptive Generation in Reasoning Models

arXiv cs.CL Papers

Summary

This paper proposes ASAG, a training-free method that adaptively stops reasoning in large reasoning models based on attention distributions, reducing token usage by ~40% while improving accuracy by 3.2% on benchmarks using DeepSeek-R1-Distill and Qwen3 models.

arXiv:2606.15070v1 Announce Type: new Abstract: By incorporating test-time compute scaling, large reasoning models (LRMs) can solve complex problems through explicit chain-of-thought (CoT) reasoning processes. However, they often suffer from overthinking, resulting in redundant token outputs and degraded accuracy. Current methods to mitigate this issue remain limited: training-based approaches require substantial computational resources, while training-free methods rely on well-crafted prompts or unreliable confidence signals. In this work, we investigate early stopping from the perspective of attention distributions and propose a simple method, ASAG, which infers the model's reasoning state and adaptively adjusts the generation strategy. The proposed framework is training-free and plug-and-play, enabling seamless integration into existing LRMs. Extensive experiments on nine benchmarks demonstrate consistent improvements across mainstream LRMs with varying parameter scales, including the DeepSeek-R1-Distill and Qwen3 series. Specifically, ASAG improves average accuracy by 3.2% while reducing the number of generated tokens by nearly 40% across all reasoning tasks on Qwen3-8B.
Original Article
View Cached Full Text

Cached at: 06/16/26, 11:45 AM

# Stop When Further Reasoning Won't Help: Attention-State Adaptive Generation in Reasoning Models
Source: [https://arxiv.org/abs/2606.15070](https://arxiv.org/abs/2606.15070)
[View PDF](https://arxiv.org/pdf/2606.15070)

> Abstract:By incorporating test\-time compute scaling, large reasoning models \(LRMs\) can solve complex problems through explicit chain\-of\-thought \(CoT\) reasoning processes\. However, they often suffer from overthinking, resulting in redundant token outputs and degraded accuracy\. Current methods to mitigate this issue remain limited: training\-based approaches require substantial computational resources, while training\-free methods rely on well\-crafted prompts or unreliable confidence signals\. In this work, we investigate early stopping from the perspective of attention distributions and propose a simple method, ASAG, which infers the model's reasoning state and adaptively adjusts the generation strategy\. The proposed framework is training\-free and plug\-and\-play, enabling seamless integration into existing LRMs\. Extensive experiments on nine benchmarks demonstrate consistent improvements across mainstream LRMs with varying parameter scales, including the DeepSeek\-R1\-Distill and Qwen3 series\. Specifically, ASAG improves average accuracy by 3\.2% while reducing the number of generated tokens by nearly 40% across all reasoning tasks on Qwen3\-8B\.

## Submission history

From: Jiakai Li \[[view email](https://arxiv.org/show-email/65b659c6/2606.15070)\] **\[v1\]**Sat, 13 Jun 2026 02:58:29 UTC \(1,220 KB\)

Similar Articles

Reasoning Can Be Restored by Correcting a Few Decision Tokens

arXiv cs.AI

This paper shows that the reasoning gap between base LLMs and large reasoning models is concentrated on a small set of early planning tokens. It introduces disagreement-guided token intervention, where replacing only those critical tokens with a reasoning model's outputs allows a base model to nearly match the reasoning model's performance.

ATTNPO: Attention-Guided Process Supervision for Efficient Reasoning

arXiv cs.CL

ATTNPO introduces an attention-guided process supervision framework that reduces overthinking in large reasoning models by leveraging intrinsic attention signals for step-level credit assignment, achieving improved performance with shorter reasoning lengths across 9 benchmarks.

Improving Reasoning Capabilities in Small Models through Mixture-of-Layers Distillation with Stepwise Attention on Key Information

arXiv cs.CL

This paper proposes a novel Chain-of-Thought distillation framework that transfers teacher models' stepwise attention on key information to student models through a Mixture-of-Layers module for dynamic layer alignment. The method achieves consistent performance improvements on mathematical and commonsense reasoning benchmarks by explicitly guiding student models to progressively focus on critical information during reasoning.