Dystruct: Dynamically Structured Diffusion Language Model Decoding via Bayesian Inference
Summary
DyStruct is a training-free Bayesian decoding framework for discrete Diffusion Language Models that enables flexible-length generation by dynamically determining expansion size and decoding order, improving accuracy on math and code tasks.
View Cached Full Text
Cached at: 05/12/26, 07:28 AM
Paper page - Dystruct: Dynamically Structured Diffusion Language Model Decoding via Bayesian Inference
Source: https://huggingface.co/papers/2605.09820 DyStruct is a training-free Bayesian decoding framework that enables flexible-length generation in discrete Diffusion Language Models (DLMs).
While discrete diffusion models offer the architectural advantage of parallel decoding, they are typically constrained by fixed sequence lengths. Existing methods for variable-length generation rely on strictly left-to-right truncation heuristics—which force premature token commitments—or require costly custom alignment training.
DyStruct formulates sequence expansion as a pure inference-time structural problem, utilizing a Bayesian framework to dynamically determine expansion size, block partitioning, and decoding order. The method executes non-monotonically: a Chinese Restaurant Process (CRP) prior and context-aware Gibbs scheduling actively search for and anchor stable sequence segments first (such as initial setups and final answer formats). These stable anchors are then used to bidirectionally constrain highly unstable intermediate reasoning steps.
By allocating unmasking iterations based strictly on structural instability, the algorithm naturally terminates early on rigid tasks (such as arithmetic templates) to optimize compute, while reserving deep refinement steps for complex logic. Evaluated on LLaDA-8B and Dream-7B, this approach yields strict accuracy improvements across mathematical reasoning and code synthesis, including a +4.4 exact match increase on Big-Bench Hard.
Similar Articles
Dynamic Chunking for Diffusion Language Models
This paper introduces Dynamic Chunking for Diffusion Language Models (DCDM), which replaces fixed positional blocks in block discrete diffusion with content-defined semantic chunks using a differentiable Chunking Attention mechanism, achieving consistent improvements across scales up to 1.5B parameters.
Multi-Block Diffusion Language Models
This paper proposes Multi-Block Diffusion Language Models (MBD-LMs), extending single-block diffusion to concurrent multi-block decoding with improved training strategies like Multi-block Teacher Forcing and an optimized Block Buffer decoding algorithm. Experiments show increased tokens per forward pass and improved accuracy on benchmarks.
Factorization-Error-Free Discrete Diffusion Language Model via Speculative Decoding
This paper introduces FeF-DLLM, a discrete diffusion language model that eliminates factorization errors by using exact prefix-conditioned factorization and accelerates inference via speculative decoding, achieving significant improvements in accuracy and speed on benchmarks such as GSM8K and MATH.
Dynamic-dLLM: Dynamic Cache-Budget and Adaptive Parallel Decoding for Training-Free Acceleration of Diffusion LLM
This paper proposes Dynamic-dLLM, a training-free framework that accelerates diffusion large language models by dynamically allocating cache-update budgets and calibrating decoding thresholds, achieving over 3x speedup on models like LLaDA and Dream while maintaining performance.
Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models
This paper proposes Dynamic Infilling Anchors (DIA), a training-free method for diffusion large language models that dynamically estimates end-anchor positions to enforce format constraints (e.g., parseable JSON, reasoning templates) while avoiding the rigidity of fixed-span approaches. Experiments show significant zero-shot gains on GSM8K and MATH benchmarks.