staleness

Tag

Cards List
#staleness

AsyncOPD: How Stale Can On-Policy Distillation Be?

arXiv cs.LG · 6d ago Cached

This paper presents AsyncOPD, a fully asynchronous on-policy distillation pipeline for LLMs, systematically studying the effects of stale-policy data and proposing estimator designs that improve training throughput by 1.6-3.8x while maintaining comparable accuracy.

0 favorites 0 likes
← Back to home

Submit Feedback