@neural_avb: There is a really banger article on On-Policy Distillation. Came out on HF a few months back.

X AI KOLs Timeline Papers

Summary

A tweet recommending an article on on-policy distillation published on Hugging Face.

There is a really banger article on On-Policy Distillation. Came out on HF a few months back. https://t.co/1TLcfQ5DLQ
Original Article
View Cached Full Text

Cached at: 06/27/26, 10:00 PM

There is a really banger article on On-Policy Distillation. Came out on HF a few months back. https://t.co/1TLcfQ5DLQ

Similar Articles

On-policy distillation: one of the hottest terms on PapersWithCode [R]

Reddit r/MachineLearning

Hugging Face's Niels introduces On-policy Distillation (OPD), a key post-training technique used in models like Qwen 3.6/3.7, GLM-5.1, and DeepSeek-V4, now featured on PapersWithCode with a linked whiteboard explanation by Sasha Rush and Dwarkesh Patel.

The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

Hugging Face Daily Papers

This paper presents a comprehensive empirical study on on-policy distillation for large language models, identifying failure mechanisms like distribution mismatch and optimization instability, and proposing fixes such as stop-gradient objectives and RLVR-adapted teachers.