Tag
ESPO introduces an early-stopping mechanism for reinforcement learning that detects and terminates failed reasoning trajectories in LLMs, improving mathematical reasoning performance while reducing compute by over 20%.