Tag
ASAG uses attention entropy to detect when reasoning is unproductive, stopping early to improve accuracy and reduce token generation. Experiments on Qwen3-8B show a 4.4% accuracy gain and over 40% fewer generated tokens.
This paper proposes ASAG, a training-free method that adaptively stops reasoning in large reasoning models based on attention distributions, reducing token usage by ~40% while improving accuracy by 3.2% on benchmarks using DeepSeek-R1-Distill and Qwen3 models.
This paper introduces MARS, a stopping rule for parallel LLM test-time scaling that probes partial traces to stop early without sacrificing accuracy, saving 25–47% of tokens across reasoning models on competition math benchmarks.
EvalStop is a scheduling primitive for multi-tenant RLHF platforms that detects and corrects reward overoptimization by monitoring downstream evaluation scores and terminating jobs on consecutive declines, achieving 98% precision and 99% recall while improving job completion time by 9% and cutting wasted compute by 22%.
ESPO introduces an early-stopping mechanism for reinforcement learning that detects and terminates failed reasoning trajectories in LLMs, improving mathematical reasoning performance while reducing compute by over 20%.
This paper introduces LEAP, a training-free method to accelerate inference in Diffusion Language Models (dLLMs) by detecting early-converging tokens, reducing denoising steps by 30% without losing accuracy.