Tag
Proposes Anti-Self-Distillation (AntiSD) which reverses the knowledge transfer direction in self-distillation to improve math reasoning efficiency and accuracy, achieving GRPO baseline accuracy in 2-10x fewer steps and up to 11.5 points higher final accuracy across models from 4B to 30B parameters.