Tag
The author built Joey, a 170M parameter masked diffusion language model from scratch, trained on FineWeb-Edu and fine-tuned on DailyDialog, achieving fluent but incoherent sentences due to capacity limitations. The project highlights the differences from autoregressive LLMs and the lessons learned from building and debugging the system.
NVIDIA introduces Nemotron-Labs Diffusion, a family of diffusion language models that generate text in parallel and iteratively refine it, offering faster generation and the ability to revise previous tokens.
This solo-author ICML paper introduces Amortized Group Relative Policy Optimization (AGRPO) to enable effective reinforcement learning post-training for diffusion language models.
This research paper introduces Chainwash, a multi-step rewriting attack that effectively removes statistical watermarks from diffusion language model (LLaDA-8B-Instruct) outputs, reducing detection rates from 87.9% to 4.86% after five chained rewrites.