long-sequence-reasoning

#long-sequence-reasoning

Shattering the Autoregressive Curse: Dynamic Epistemic Entropy Orchestrated Erasable Reinforcement Learning for LLMs

arXiv cs.AI ↗ · 2026-06-17 Cached

This paper proposes E³RL, a reinforcement learning method that uses dynamic epistemic entropy thresholds to enable LLMs to excise local logical defects during generation, overcoming the autoregressive curse in long-horizon reasoning and achieving state-of-the-art results on mathematical reasoning benchmarks like AIME.

0 favorites 0 likes

long-sequence-reasoning

Shattering the Autoregressive Curse: Dynamic Epistemic Entropy Orchestrated Erasable Reinforcement Learning for LLMs

Submit Feedback