Tag
Proposes learning the unmasking order in masked diffusion models using a lightweight policy network, with a weighted loss that outperforms heuristics on combinatorial tasks and protein design.
Introduces DLLM-JEPA, a JEPA formulation for masked diffusion language models that constructs two views from a single input via the diffusion noise schedule, reducing training FLOPs by 33% relative to LLM-JEPA and improving fine-tuning performance on tasks like GSM8K.
This paper identifies a failure mode in masked diffusion language models where confidence-based decoding leads to high-confidence errors on complex reasoning tasks, and shows that confidence-aligned training exacerbates this issue while random masking preserves reasoning performance.
This paper proposes using Masked Diffusion Language Models (MDLMs) as text-based world models for agentic reinforcement learning, showing that their any-order denoising objective avoids prefix mode collapse and leads to stronger performance than autoregressive baselines.
AnchorDiff proposes a topology-aware masked diffusion framework for radiology report generation, integrating RadGraph-derived clinical anchors and confidence-based rewriting to achieve state-of-the-art results on MIMIC-CXR and MIMIC-RG4 benchmarks.
Introduces Discrete Stochastic Localization (DSL), a continuous-state diffusion framework for non-autoregressive text generation that uses unit-sphere token embeddings and a timestep-invariant denoiser, achieving better distributional faithfulness than masked discrete diffusion models on OpenWebText.
Introduces Token-to-Mask (T2M) remasking to fix generation errors in masked diffusion LMs by resetting suspect tokens to mask state instead of overwriting, yielding up to +5.92 accuracy on CMATH without extra training or parameters.
CRoCoDiL proposes a continuous and robust conditioned diffusion approach for language that shifts masked diffusion models into a continuous semantic space, achieving superior generation quality and 10x faster sampling speeds compared to discrete methods like LLaDA.