tv-loss

Tag

Cards List
#tv-loss

Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

Hugging Face Daily Papers · 6d ago Cached

Bebop proposes entropy-aware multi-token prediction with rejection sampling and a novel TV loss to accelerate RL training of LLMs, achieving up to 1.8x speedup. The method addresses the degradation of acceptance rates during RL by optimizing training objectives.

0 favorites 0 likes
← Back to home

Submit Feedback