proximal-policy-optimization

Tag

Cards List
#proximal-policy-optimization

ESPO: Early-Stopping Proximal Policy Optimization

Hugging Face Daily Papers · 2026-05-28 Cached

ESPO introduces an early-stopping mechanism for reinforcement learning that detects and terminates failed reasoning trajectories in LLMs, improving mathematical reasoning performance while reducing compute by over 20%.

0 favorites 0 likes
← Back to home

Submit Feedback