Tag
This paper proposes Multi-Block Diffusion Language Models (MBD-LMs), extending single-block diffusion to concurrent multi-block decoding with improved training strategies like Multi-block Teacher Forcing and an optimized Block Buffer decoding algorithm. Experiments show increased tokens per forward pass and improved accuracy on benchmarks.