Dystruct: Dynamically Structured Diffusion Language Model Decoding via Bayesian Inference
Summary
DyStruct is a training-free Bayesian decoding framework for discrete Diffusion Language Models that enables flexible-length generation by dynamically determining expansion size and decoding order, improving accuracy on math and code tasks.
View Cached Full Text
Cached at: 05/12/26, 07:28 AM
Paper page - Dystruct: Dynamically Structured Diffusion Language Model Decoding via Bayesian Inference
Source: https://huggingface.co/papers/2605.09820 DyStruct is a training-free Bayesian decoding framework that enables flexible-length generation in discrete Diffusion Language Models (DLMs).
While discrete diffusion models offer the architectural advantage of parallel decoding, they are typically constrained by fixed sequence lengths. Existing methods for variable-length generation rely on strictly left-to-right truncation heuristics—which force premature token commitments—or require costly custom alignment training.
DyStruct formulates sequence expansion as a pure inference-time structural problem, utilizing a Bayesian framework to dynamically determine expansion size, block partitioning, and decoding order. The method executes non-monotonically: a Chinese Restaurant Process (CRP) prior and context-aware Gibbs scheduling actively search for and anchor stable sequence segments first (such as initial setups and final answer formats). These stable anchors are then used to bidirectionally constrain highly unstable intermediate reasoning steps.
By allocating unmasking iterations based strictly on structural instability, the algorithm naturally terminates early on rigid tasks (such as arithmetic templates) to optimize compute, while reserving deep refinement steps for complex logic. Evaluated on LLaDA-8B and Dream-7B, this approach yields strict accuracy improvements across mathematical reasoning and code synthesis, including a +4.4 exact match increase on Big-Bench Hard.
Similar Articles
$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction
R²-dLLM introduces spatio-temporal redundancy reduction techniques that cut diffusion LLM decoding steps by up to 75% while preserving generation quality, addressing a key deployment bottleneck.
DALM: A Domain-Algebraic Language Model via Three-Phase Structured Generation
DALM proposes a domain-algebraic language model that generates text under exact structural constraints derived from a domain lattice, addressing hallucination by organizing knowledge into separate domain fibers with algebraic guarantees. The model uses three-phase structured denoising (domain → relation → concept) with domain-annotated training data to prevent cross-domain contamination.
Bulding my own Diffusion Language Model from scratch was easier than I thought [P]
Developer shares a minimalist 7.5M-parameter diffusion language model trained from scratch on Shakespeare, releasing the code as a learning resource.
DFlash: Block Diffusion for Flash Speculative Decoding
DFlash is a new speculative decoding framework that uses a lightweight block diffusion model for parallel token drafting, achieving over 6x acceleration compared to autoregressive methods. It significantly outperforms existing state-of-the-art methods like EAGLE-3 while maintaining high output quality.
Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models
This paper introduces a novel adaptive scheduler for steering discrete diffusion language models using sparse autoencoders, demonstrating that targeting interventions based on when specific attributes commit improves control quality and strength over uniform methods.