Tag
The paper proposes EGRSD and CL-EGRSD, on-policy self-distillation methods that weight token-level supervision by teacher entropy to improve reasoning accuracy-length tradeoff in LLMs, evaluated on Qwen3-4B and Qwen3-8B.