model-plasticity

#model-plasticity

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

arXiv cs.LG ↗ · 2d ago Cached

This paper investigates the loss of model plasticity after excessive supervised fine-tuning (SFT) in the SFT-then-RL pipeline for LLMs, and proposes Rejuvenation, a method that restores plasticity via base-anchored model fusion and targeted neuron reset, consistently improving RL performance.

0 favorites 0 likes

model-plasticity

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

Submit Feedback