LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models
Summary
LEAD dynamically adapts reasoning efficiency during training by using online calibration of correctness-efficiency trade-offs and adaptive problem-specific length targets, improving mathematical reasoning accuracy and reducing output length.
View Cached Full Text
Cached at: 05/15/26, 12:21 AM
Paper page - LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models
Source: https://huggingface.co/papers/2605.09806
Abstract
LEAD is a method that dynamically adapts reasoning efficiency during training by using online calibration of correctness-efficiency trade-offs and adaptive problem-specific length targets to improve mathematical reasoning accuracy and efficiency.
Largereasoning models, such as OpenAI o1 and DeepSeek-R1, tend to become increasingly verbose as their reasoning capabilities improve. These inflatedChain-of-Thought(CoT) trajectories often exceed what the underlying problems require, wasting compute, latency, and context budgets. While introducinglength-based efficiency rewardsduringreinforcement learningoffers a natural remedy, existing methods struggle with two fundamental challenges: the optimal balance between correctness and efficiency is non-stationary throughout training, and intrinsic reasoning budgets vary drastically across problems. Relying on static reward weights and global length constraints inevitably forces a compromise between degraded accuracy and unrealized compression. To overcome these limitations, we propose LEAD (Length-Efficient Adaptive and Dynamic reasoning), a method that replaces static heuristics with online, self-adaptive mechanisms. LEAD dynamically calibrates the correctness-efficiency trade-off at each step using aPotential-Scaled Instability, directing optimization capacity to the most informative learning signal. Furthermore, it estimates an adaptive per-problem target length online based on the model’s own correct rollouts, applying a symmetricefficiency rewardthat penalizes both overthinking and over-compression. Evaluated on fivemathematical reasoning benchmarks, LEAD achieves the highest accuracy andAccuracy-Efficiency Scoreamong RL-trained efficient-reasoning methods while producing substantially shorter outputs than the base model.
View arXiv pageView PDFGitHub2Add to collection
Get this paper in your agent:
hf papers read 2605\.09806
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.09806 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.09806 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.09806 in a Space README.md to link it from this page.
Collections including this paper1
Similar Articles
Enhanced and Efficient Reasoning in Large Learning Models
This paper proposes a method for improving reasoning in large language models by recoding data to explicitly represent relationships, enabling efficient principled reasoning with polynomial-time learnability for relational rules, which addresses hallucinations and supports sound reasoning across multiple calls.
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
This paper introduces ScaleLogic, a framework demonstrating that RL training compute scales as a power law with reasoning depth in LLMs. It highlights that logical expressiveness is key to improving downstream transfer and training efficiency.
Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners
This paper investigates multilingual latent reasoning in large reasoning models across 11 languages, revealing that while latent reasoning capabilities exist, they are unevenly distributed—stronger in resource-rich languages and weaker in low-resource ones. The study finds that despite surface-level differences, the internal reasoning mechanisms are largely aligned with an English-centered pathway.
LaTER: Efficient Test-Time Reasoning via Latent Exploration and Explicit Verification
The paper introduces LaTER, a two-stage reasoning paradigm that combines latent exploration with explicit Chain-of-Thought verification to reduce token usage and improve efficiency in large language models without sacrificing accuracy.
Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness
This paper introduces the VPG-EA framework, which uses variational inference and posterior guidance to improve the reasoning efficiency of large language models by addressing the 'overthinking' phenomenon in chain-of-thought generation.