D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
Summary
This paper introduces D-OPSD, a novel training paradigm for step-distilled diffusion models that enables on-policy self-distillation during supervised fine-tuning. It allows models to learn new concepts or styles without compromising their efficient few-step inference capabilities.
View Cached Full Text
Cached at: 05/08/26, 08:12 AM
Paper page - D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
Source: https://huggingface.co/papers/2605.05204 Authors:
,
,
,
,
,
,
,
,
,
Abstract
A new training approach called D-OPSD enables efficient supervised fine-tuning for diffusion models by leveraging on-policy self-distillation with text and multimodal features while preserving few-step inference capabilities.
The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly continuoussupervised fine-tuning. For example, applying the commonly used fine-tuning technique would compromises their inherentfew-step inferencecapability. To address this, we propose D-OPSD, a novel training paradigm forstep-distilled diffusion modelsthat enableson-policy learningduringsupervised fine-tuning. We first find that the modern diffusion model where the LLM/VLM serves as the encoder can inherit its encoder’s in-context capabilities. This enables us to make the training as an on-policyself-distillationprocess. Specifically, during training, we make the model acts as both the teacher and the student with different contexts, where the student is conditioned only on thetext feature, while the teacher is conditioned on themultimodal featureof both the text prompt and the target image. Training minimizes the two predicted distributions over the student’s own roll-outs. By optimized on the model’s own trajectory and under it’s own supervision, D-OPSD enables the model to learn new concept, style, etc. without sacrificing the original few-step capacity.
View arXiv pageView PDFProject pageGitHub24Add to collection
Get this paper in your agent:
hf papers read 2605\.05204
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.05204 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.05204 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.05204 in a Space README.md to link it from this page.
Collections including this paper1
Similar Articles
DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models
DiffusionOPD proposes a multi-task training paradigm for diffusion models that uses online policy distillation to efficiently combine task-specific teachers into a unified student, achieving state-of-the-art results on all evaluated benchmarks.
Learning from the Self-future: On-policy Self-distillation for dLLMs
Introduces d-OPSD, the first on-policy self-distillation framework for diffusion large language models, using suffix conditioning and step-level supervision to outperform RLVR and SFT baselines on reasoning benchmarks.
Self-Distillation Enables Continual Learning [pdf]
Introduces Self-Distillation Fine-Tuning (SDFT), a method that enables on-policy learning from demonstrations to achieve continual learning without catastrophic forgetting, outperforming supervised fine-tuning.
On-policy distillation: one of the hottest terms on PapersWithCode [R]
Hugging Face's Niels introduces On-policy Distillation (OPD), a key post-training technique used in models like Qwen 3.6/3.7, GLM-5.1, and DeepSeek-V4, now featured on PapersWithCode with a linked whiteboard explanation by Sasha Rush and Dwarkesh Patel.
Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation
The paper introduces OPDLM, a method that transforms autoregressive language models into diffusion language models via on-policy distillation, requiring 15x to 7000x fewer training tokens while retaining knowledge from the original model.