@neural_avb: There is a really banger article on On-Policy Distillation. Came out on HF a few months back.
Summary
A tweet recommending an article on on-policy distillation published on Hugging Face.
View Cached Full Text
Cached at: 06/27/26, 10:00 PM
There is a really banger article on On-Policy Distillation. Came out on HF a few months back. https://t.co/1TLcfQ5DLQ
Similar Articles
@neural_avb: If yall are interested in On Policy Distillation, check this specific repo. Somebody put together a curated collection …
A curated collection of papers and tools for On Policy Distillation, organized and annotated with a getting-started section, shared via a GitHub repo.
On-policy distillation: one of the hottest terms on PapersWithCode [R]
Hugging Face's Niels introduces On-policy Distillation (OPD), a key post-training technique used in models like Qwen 3.6/3.7, GLM-5.1, and DeepSeek-V4, now featured on PapersWithCode with a linked whiteboard explanation by Sasha Rush and Dwarkesh Patel.
@NielsRogge: One of the hottest terms in AI right now is "On-policy distillation". It is a post-training technique in which a studen…
On-policy distillation is highlighted as a hot post-training technique combining distillation with online RL, now listed on PapersWithCode with 183 citing papers.
@louieworth: New blog post: On-Policy Distillation — Promise, Pitfalls, and Prospects. OPD combines on-policy rollouts with dense te…
This blog post discusses On-Policy Distillation (OPD), a technique that combines on-policy rollouts with dense teacher supervision, and highlights its promise, three failure modes, and the author's new paper on the topic.
The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes
This paper presents a comprehensive empirical study on on-policy distillation for large language models, identifying failure mechanisms like distribution mismatch and optimization instability, and proposing fixes such as stop-gradient objectives and RLVR-adapted teachers.