@sheriyuo: This paper proposes ASAG, Attention-State Adaptive Generation, a training-free, plug-and-play stopping framework for re…

X AI KOLs Timeline Papers

Summary

ASAG uses attention entropy to detect when reasoning is unproductive, stopping early to improve accuracy and reduce token generation. Experiments on Qwen3-8B show a 4.4% accuracy gain and over 40% fewer generated tokens.

This paper proposes ASAG, Attention-State Adaptive Generation, a training-free, plug-and-play stopping framework for reasoning models. Instead of relying only on output confidence, ASAG uses attention entropy to detect when further thinking is no longer useful, then stops early or redirects unproductive reasoning. The authors report a 4.4% relative accuracy gain while cutting generated tokens by over 40% on Qwen3-8B across reasoning tasks. Stop When Further Reasoning Won't Help: Attention-State Adaptive Generation in Reasoning Models Paper: http://arxiv.org/abs/2606.15070
Original Article
View Cached Full Text

Cached at: 06/16/26, 05:41 PM

This paper proposes ASAG, Attention-State Adaptive Generation, a training-free, plug-and-play stopping framework for reasoning models.

Instead of relying only on output confidence, ASAG uses attention entropy to detect when further thinking is no longer useful, then stops early or redirects unproductive reasoning.

The authors report a 4.4% relative accuracy gain while cutting generated tokens by over 40% on Qwen3-8B across reasoning tasks.

Stop When Further Reasoning Won’t Help: Attention-State Adaptive Generation in Reasoning Models Paper: http://arxiv.org/abs/2606.15070


Stop When Further Reasoning Won’t Help: Attention-State Adaptive Generation in Reasoning Models

Source: https://arxiv.org/abs/2606.15070 View PDF

Abstract:By incorporating test-time compute scaling, large reasoning models (LRMs) can solve complex problems through explicit chain-of-thought (CoT) reasoning processes. However, they often suffer from overthinking, resulting in redundant token outputs and degraded accuracy. Current methods to mitigate this issue remain limited: training-based approaches require substantial computational resources, while training-free methods rely on well-crafted prompts or unreliable confidence signals. In this work, we investigate early stopping from the perspective of attention distributions and propose a simple method, ASAG, which infers the model’s reasoning state and adaptively adjusts the generation strategy. The proposed framework is training-free and plug-and-play, enabling seamless integration into existing LRMs. Extensive experiments on nine benchmarks demonstrate consistent improvements across mainstream LRMs with varying parameter scales, including the DeepSeek-R1-Distill and Qwen3 series. Specifically, ASAG improves average accuracy by 3.2% while reducing the number of generated tokens by nearly 40% across all reasoning tasks on Qwen3-8B.

Submission history

From: Jiakai Li [view email] **[v1]**Sat, 13 Jun 2026 02:58:29 UTC (1,220 KB)

Similar Articles