advantage-function

Tag

Cards List
#advantage-function

Don't Let Gains FADE: Breaking Down Policy Gradient Weights in RL

arXiv cs.LG · yesterday Cached

This paper introduces FADE (Focal Advantage with Dynamic Entropy), a self-adapting advantage function that dynamically schedules gradient weights during RL post-training of LLMs, achieving faster convergence and better accuracy-diversity trade-offs compared to static baselines.

0 favorites 0 likes
← Back to home

Submit Feedback