Tag
This paper introduces FADE (Focal Advantage with Dynamic Entropy), a self-adapting advantage function that dynamically schedules gradient weights during RL post-training of LLMs, achieving faster convergence and better accuracy-diversity trade-offs compared to static baselines.