LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models

Hugging Face Daily Papers 05/10/26, 12:00 AM Papers

reasoning chain-of-thought efficiency reinforcement-learning language-models adaptive-reasoning

Summary

LEAD dynamically adapts reasoning efficiency during training by using online calibration of correctness-efficiency trade-offs and adaptive problem-specific length targets, improving mathematical reasoning accuracy and reducing output length.

Large reasoning models, such as OpenAI o1 and DeepSeek-R1, tend to become increasingly verbose as their reasoning capabilities improve. These inflated Chain-of-Thought (CoT) trajectories often exceed what the underlying problems require, wasting compute, latency, and context budgets. While introducing length-based efficiency rewards during reinforcement learning offers a natural remedy, existing methods struggle with two fundamental challenges: the optimal balance between correctness and efficiency is non-stationary throughout training, and intrinsic reasoning budgets vary drastically across problems. Relying on static reward weights and global length constraints inevitably forces a compromise between degraded accuracy and unrealized compression. To overcome these limitations, we propose LEAD (Length-Efficient Adaptive and Dynamic reasoning), a method that replaces static heuristics with online, self-adaptive mechanisms. LEAD dynamically calibrates the correctness-efficiency trade-off at each step using a Potential-Scaled Instability, directing optimization capacity to the most informative learning signal. Furthermore, it estimates an adaptive per-problem target length online based on the model's own correct rollouts, applying a symmetric efficiency reward that penalizes both overthinking and over-compression. Evaluated on five mathematical reasoning benchmarks, LEAD achieves the highest accuracy and Accuracy-Efficiency Score among RL-trained efficient-reasoning methods while producing substantially shorter outputs than the base model.

Original Article

View Cached Full Text

Cached at: 05/15/26, 12:21 AM

Paper page - LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models

Source: https://huggingface.co/papers/2605.09806

Abstract

LEAD is a method that dynamically adapts reasoning efficiency during training by using online calibration of correctness-efficiency trade-offs and adaptive problem-specific length targets to improve mathematical reasoning accuracy and efficiency.

Largereasoning models, such as OpenAI o1 and DeepSeek-R1, tend to become increasingly verbose as their reasoning capabilities improve. These inflatedChain-of-Thought(CoT) trajectories often exceed what the underlying problems require, wasting compute, latency, and context budgets. While introducinglength-based efficiency rewardsduringreinforcement learningoffers a natural remedy, existing methods struggle with two fundamental challenges: the optimal balance between correctness and efficiency is non-stationary throughout training, and intrinsic reasoning budgets vary drastically across problems. Relying on static reward weights and global length constraints inevitably forces a compromise between degraded accuracy and unrealized compression. To overcome these limitations, we propose LEAD (Length-Efficient Adaptive and Dynamic reasoning), a method that replaces static heuristics with online, self-adaptive mechanisms. LEAD dynamically calibrates the correctness-efficiency trade-off at each step using aPotential-Scaled Instability, directing optimization capacity to the most informative learning signal. Furthermore, it estimates an adaptive per-problem target length online based on the model’s own correct rollouts, applying a symmetricefficiency rewardthat penalizes both overthinking and over-compression. Evaluated on fivemathematical reasoning benchmarks, LEAD achieves the highest accuracy andAccuracy-Efficiency Scoreamong RL-trained efficient-reasoning methods while producing substantially shorter outputs than the base model.

View arXiv page View PDF GitHub2 Add to collection

Get this paper in your agent:

hf papers read 2605\.09806

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.09806 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.09806 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.09806 in a Space README.md to link it from this page.

LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models

Paper page - LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper1

Similar Articles

Enhanced and Efficient Reasoning in Large Learning Models

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners

LaTER: Efficient Test-Time Reasoning via Latent Exploration and Explicit Verification

Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness

Submit Feedback

Similar Articles

Enhanced and Efficient Reasoning in Large Learning Models

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners

LaTER: Efficient Test-Time Reasoning via Latent Exploration and Explicit Verification

Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness