Tag
This paper proposes a principled offline reasoning distillation framework that corrects teacher-student distribution drift, improving reasoning accuracy on math benchmarks without requiring online rollouts.