diffusion-language-models

#diffusion-language-models

Improved Large Language Diffusion Models

arXiv cs.CL ↗ · 3d ago Cached

iLLaDA is an 8B parameter masked diffusion language model with fully bidirectional attention, trained from scratch on 12T tokens. It shows broad improvements over LLaDA and remains competitive with Qwen2.5 7B on several benchmarks. The model and code are open-sourced.

0 favorites 0 likes

#diffusion-language-models

When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs

arXiv cs.LG ↗ · 4d ago Cached

This paper investigates the effectiveness of top-1 collapse rate as a stability monitor for short-horizon LoRA fine-tuning of discrete diffusion language models, finding it has zero precision, and proposes max gradient norm as a more reliable alternative with higher precision and F1 score on LLaDA-family models.

0 favorites 0 likes

#diffusion-language-models

Diffusion Language Models: An Experimental Analysis

arXiv cs.AI ↗ · 2026-06-20 Cached

A systematic experimental analysis evaluating eight state-of-the-art Diffusion Language Models across multiple benchmarks, analyzing trade-offs between generation quality and computational efficiency.

0 favorites 0 likes

#diffusion-language-models

Self-Generated Error Training for Token Editing in Diffusion Language Models

arXiv cs.CL ↗ · 2026-06-17 Cached

Proposes Self-Generated T2T, a training method that aligns token editing training with inference by using the model's own predictions as error sources, improving accuracy on LLaDA2.1.

0 favorites 0 likes

#diffusion-language-models

PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models

Hugging Face Daily Papers ↗ · 2026-06-17 Cached

PerceptionDLM introduces a multimodal diffusion language model that enables parallel region perception via structured attention masking and efficient prompting, achieving faster inference without sacrificing caption quality. Experiments show competitive performance with substantial speed improvements for multi-region perception tasks.

0 favorites 0 likes

#diffusion-language-models

Semantic DLM+: Improving Diffusion Language Models through Bias-variance Trade-off in Transition Kernel Design

arXiv cs.LG ↗ · 2026-06-16 Cached

This paper theoretically analyzes diffusion language models through a bias-variance lens, identifying trade-offs between masking and uniform diffusion kernels. It proposes SemDLM+, which adds a global transition and semantic-frequency penalty to overcome the semantic basin problem, achieving competitive generation quality on LM1B and OpenWebText benchmarks.

0 favorites 0 likes

#diffusion-language-models

Teaching Diffusion to Speculate Left-to-Right

arXiv cs.CL ↗ · 2026-06-11 Cached

This paper proposes three training-time interventions (positional weighting, first-error focal loss, and chain loss) to align diffusion-based draft models with autoregressive verification in speculative decoding, improving accepted prefix length by 21–76% without extra inference cost.

0 favorites 0 likes

#diffusion-language-models

Prefilling-dLLM: Predictive Prefilling for Long-Context Inference in Diffusion Language Models

arXiv cs.CL ↗ · 2026-06-10 Cached

This paper proposes Prefilling-dLLM, a training-free framework that partitions the prefix into chunks and caches KV representations, achieving state-of-the-art quality and up to 28x speedup for long-context inference in diffusion language models.

0 favorites 0 likes

#diffusion-language-models

Enabling KV Caching of Shared Prefix for Diffusion Language Models

arXiv cs.LG ↗ · 2026-06-09 Cached

This paper proposes BiCache, a novel KV caching technique for shared prefixes in diffusion language models, which avoids accuracy collapse by dynamically reusing cached keys and values in shallow layers and achieves 36.3%–98.3% throughput improvement.

0 favorites 0 likes

#diffusion-language-models

Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models

arXiv cs.CL ↗ · 2026-06-04 Cached

This paper proposes Dynamic Infilling Anchors (DIA), a training-free method for diffusion large language models that dynamically estimates end-anchor positions to enforce format constraints (e.g., parseable JSON, reasoning templates) while avoiding the rigidity of fixed-span approaches. Experiments show significant zero-shot gains on GSM8K and MATH benchmarks.

0 favorites 0 likes

#diffusion-language-models

Read the Trace, Steer the Path: Trajectory-Aware Reinforcement Learning for Diffusion Language Models

arXiv cs.CL ↗ · 2026-06-04 Cached

This paper introduces CAPR (Cached-Amortized Path Refinement), a reinforcement learning algorithm for diffusion large language models that extracts tree-like supervision signals from the denoising trace without the compute cost of full tree rollouts. CAPR achieves state-of-the-art performance on reasoning benchmarks like GSM8K, Math500, Sudoku, and Countdown at roughly 0.75x the cost of flat rollouts.

0 favorites 0 likes

#diffusion-language-models

Supportive Token Revealing for Fast Diffusion Language Model Decoding

arXiv cs.CL ↗ · 2026-06-04 Cached

This paper proposes AXON, a training-free module that improves the quality-latency trade-off of discrete diffusion language model decoding by intelligently selecting 'anchor' tokens to reveal first, using attention, uncertainty, and confidence signals to support subsequent denoising steps. Experiments on reasoning and code-generation benchmarks show AXON reduces function evaluations while maintaining or improving accuracy.

0 favorites 0 likes

#diffusion-language-models

EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models

arXiv cs.CL ↗ · 2026-06-02 Cached

This paper presents EPIC, an efficient framework for context-free grammar constrained decoding in diffusion language models that reduces inference time by up to 67.5% while maintaining syntactic correctness.

0 favorites 0 likes

#diffusion-language-models

DLLM-JEPA: Joint Embedding Predictive Architectures for Masked Diffusion Language Models

arXiv cs.CL ↗ · 2026-06-02 Cached

Introduces DLLM-JEPA, a JEPA formulation for masked diffusion language models that constructs two views from a single input via the diffusion noise schedule, reducing training FLOPs by 33% relative to LLM-JEPA and improving fine-tuning performance on tasks like GSM8K.

0 favorites 0 likes

#diffusion-language-models

dMoE: dLLMs with Learnable Block Experts

arXiv cs.CL ↗ · 2026-06-01 Cached

dMoE proposes block-level expert routing for diffusion LLMs, reducing the number of uniquely activated experts from 69.5 to 14.6 while retaining 99.11% performance and achieving 76-80% memory reduction with 1.14-1.66× speedup.

0 favorites 0 likes

#diffusion-language-models

GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

GDSD proposes a reinforcement learning method that directly distills denoisers from advantage-guided self-teachers for diffusion language models, avoiding biases from ELBO-based likelihood surrogates. It achieves up to +19.6% accuracy improvements on planning, math, and coding benchmarks over prior state-of-the-art methods.

0 favorites 0 likes

#diffusion-language-models

When Confidence Misleads: Suffix Anchoring and Anchor-Proximity Confidence Modulation for Diffusion Language Models

Hugging Face Daily Papers ↗ · 2026-05-27 Cached

Researchers propose a training-free method called Suffix-Anchored Confidence Modulation to improve confidence-based decoding in diffusion language models by addressing issues with EOT tokens and premature decoding.

0 favorites 0 likes

#diffusion-language-models

[OSS] dlmserve - first serving engine for diffusion language models

Reddit r/LocalLLaMA ↗ · 2026-05-26

dlmserve is the first open-source serving engine for diffusion language models, providing an OpenAI-compatible API, continuous batching, and 2.5x throughput over Hugging Face, all within 12GB VRAM.

0 favorites 0 likes

#diffusion-language-models

The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models

arXiv cs.CL ↗ · 2026-05-26 Cached

This paper introduces TraceLock, a lightweight plug-in controller that learns a token-commitment policy for frozen diffusion language models, improving the quality-step tradeoff across various tasks without retraining.

0 favorites 0 likes

#diffusion-language-models

Extracting Training Data from Diffusion Language Models via Infilling

arXiv cs.CL ↗ · 2026-05-26 Cached

This paper introduces infilling extraction, a new method for extracting training data from diffusion language models by using arbitrary binary masks, showing that such models are more vulnerable to memorization attacks than previously thought.

0 favorites 0 likes

diffusion-language-models

Submit Feedback