@neural_avb: There is a really banger article on On-Policy Distillation. Came out on HF a few months back.

X AI KOLs Timeline 06/27/26, 05:20 PM Papers

Summary

A tweet recommending an article on on-policy distillation published on Hugging Face.

There is a really banger article on On-Policy Distillation. Came out on HF a few months back. https://t.co/1TLcfQ5DLQ

Original Article

View Cached Full Text

Cached at: 06/27/26, 10:00 PM

There is a really banger article on On-Policy Distillation. Came out on HF a few months back. https://t.co/1TLcfQ5DLQ

Similar Articles

@neural_avb: If yall are interested in On Policy Distillation, check this specific repo. Somebody put together a curated collection …

X AI KOLs Timeline

A curated collection of papers and tools for On Policy Distillation, organized and annotated with a getting-started section, shared via a GitHub repo.

On-policy distillation: one of the hottest terms on PapersWithCode [R]

Reddit r/MachineLearning

Hugging Face's Niels introduces On-policy Distillation (OPD), a key post-training technique used in models like Qwen 3.6/3.7, GLM-5.1, and DeepSeek-V4, now featured on PapersWithCode with a linked whiteboard explanation by Sasha Rush and Dwarkesh Patel.

@NielsRogge: One of the hottest terms in AI right now is "On-policy distillation". It is a post-training technique in which a studen…

X AI KOLs Timeline

On-policy distillation is highlighted as a hot post-training technique combining distillation with online RL, now listed on PapersWithCode with 183 citing papers.

@louieworth: New blog post: On-Policy Distillation — Promise, Pitfalls, and Prospects. OPD combines on-policy rollouts with dense te…

X AI KOLs Following

This blog post discusses On-Policy Distillation (OPD), a technique that combines on-policy rollouts with dense teacher supervision, and highlights its promise, three failure modes, and the author's new paper on the topic.

The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

Hugging Face Daily Papers

This paper presents a comprehensive empirical study on on-policy distillation for large language models, identifying failure mechanisms like distribution mismatch and optimization instability, and proposing fixes such as stop-gradient objectives and RLVR-adapted teachers.