Tag
This paper presents AsyncOPD, a fully asynchronous on-policy distillation pipeline for LLMs, systematically studying the effects of stale-policy data and proposing estimator designs that improve training throughput by 1.6-3.8x while maintaining comparable accuracy.