Tag
The paper proposes Crosslingual On-Policy Self-Distillation (COPSD), a method to transfer high-resource language reasoning capabilities to low-resource languages using a shared student-teacher architecture. Experiments across 17 African languages show significant improvements in mathematical reasoning and answer-format adherence, outperforming Group Relative Policy Optimization (GRPO).