early-exit

#early-exit

Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

Hugging Face Daily Papers ↗ · 2026-05-17 Cached

This paper introduces PUMA, a plug-and-play framework that detects semantic redundancy in chain-of-thought reasoning to enable early exit, achieving 26.2% average token reduction across multiple models and benchmarks while preserving accuracy and reasoning quality.

0 favorites 0 likes

#early-exit

Know When To Fold 'Em: Token-Efficient LLM Synthetic Data Generation via Multi-Stage In-Flight Rejection

arXiv cs.AI ↗ · 2026-05-15 Cached

This paper proposes Multi-Stage In-Flight Rejection (MSIFR), a training-free framework that reduces token waste in LLM-based synthetic data generation by detecting and terminating low-quality generation trajectories at intermediate checkpoints. Across five models and seven benchmarks, MSIFR reduces token consumption by 11–77% as a standalone method and up to 78.2% when combined with early-exit methods, while preserving or improving accuracy.

0 favorites 0 likes

#early-exit

Balancing Stability and Plasticity in Sequentially Trained Early-Exiting Neural Networks

arXiv cs.LG ↗ · 2026-05-08 Cached

The paper addresses catastrophic forgetting in sequentially trained early-exiting neural networks and proposes two methods based on Elastic Weight Consolidation and Learning without Forgetting to preserve earlier exit performance while adding new ones.

0 favorites 0 likes

#early-exit

Two-dimensional early exit optimisation of LLM inference

arXiv cs.CL ↗ · 2026-04-22 Cached

Authors propose a 2D early-exit method that jointly trims layers and input sentences, yielding 1.4–2.3× extra speed-up on sentiment tasks across Llama 3.1/3.2, Gemma and Qwen models.

0 favorites 0 likes

#early-exit

River-LLM: Large Language Model Seamless Exit Based on KV Share

Hugging Face Daily Papers ↗ · 2026-04-20 Cached

River-LLM proposes a training-free early-exit framework for decoder-only LLMs that uses KV-sharing to eliminate KV-cache gaps, achieving 1.71–2.16× speedup without quality loss.

0 favorites 0 likes

early-exit

Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

Know When To Fold 'Em: Token-Efficient LLM Synthetic Data Generation via Multi-Stage In-Flight Rejection

Balancing Stability and Plasticity in Sequentially Trained Early-Exiting Neural Networks

Two-dimensional early exit optimisation of LLM inference

River-LLM: Large Language Model Seamless Exit Based on KV Share

Submit Feedback