rank-inversion

Tag

Cards List
#rank-inversion

SFT Overtraining Predicts Rank Inversion via Entropy Collapse Under RLVR

arXiv cs.LG · 2026-06-18 Cached

This paper demonstrates that selecting the SFT checkpoint with the highest pass@1 for GRPO can fail because SFT overtraining compresses output diversity, leading to entropy collapse and rank inversion in reinforcement learning. Experiments on Qwen2.5-Coder-3B and DeepSeek-Coder-6.7B show that pre-RL entropy is positively associated with GRPO outcome, and a two-stage diagnostic can detect high-risk checkpoints.

0 favorites 0 likes
← Back to home

Submit Feedback