Tag
This article analyzes post-training methods for language models through a distributional perspective, comparing how SFT, RL, and on-policy distillation reshape model distributions and impact phenomena like catastrophic forgetting.