staleness

#staleness

AsyncOPD: How Stale Can On-Policy Distillation Be?

arXiv cs.LG ↗ · 6d ago Cached

This paper presents AsyncOPD, a fully asynchronous on-policy distillation pipeline for LLMs, systematically studying the effects of stale-policy data and proposing estimator designs that improve training throughput by 1.6-3.8x while maintaining comparable accuracy.

0 favorites 0 likes

staleness

AsyncOPD: How Stale Can On-Policy Distillation Be?

Submit Feedback