tv-loss

#tv-loss

Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

Hugging Face Daily Papers ↗ · 6d ago Cached

Bebop proposes entropy-aware multi-token prediction with rejection sampling and a novel TV loss to accelerate RL training of LLMs, achieving up to 1.8x speedup. The method addresses the degradation of acceptance rates during RL by optimizing training objectives.

0 favorites 0 likes

tv-loss

Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

Submit Feedback