ar-to-dlm

#ar-to-dlm

Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation

arXiv cs.CL ↗ · 4d ago Cached

The paper introduces OPDLM, a method that transforms autoregressive language models into diffusion language models via on-policy distillation, requiring 15x to 7000x fewer training tokens while retaining knowledge from the original model.

0 favorites 0 likes

ar-to-dlm

Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation

Submit Feedback