VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination

arXiv cs.CL Papers

Summary

VoidPadding introduces a [VOID] token to handle padding in masked diffusion language models, allowing [EOS] to focus solely on semantic termination. This method significantly improves performance on reasoning and coding benchmarks while reducing decoding steps.

arXiv:2606.17999v1 Announce Type: new Abstract: MDLMs generate text by denoising a preallocated masked response canvas, making response-length modeling central to instruction tuning. Existing MDLMs often inherit the autoregressive convention of using repeated \texttt{[EOS]} tokens for padding during instruction tuning, giving \texttt{[EOS]} a dual role as both a semantic terminator and a padding token. We show that this dual role is a root cause of \texttt{[EOS]} overflow under large-block decoding. To decouple these roles, we propose VoidPadding, which introduces \texttt{[VOID]} for padding and reserves \texttt{[EOS]} for termination. During inference, the learned \texttt{[EOS]} signal enables early stopping, while the learned \texttt{[VOID]} signal guides adaptive response canvas expansion. On Dream-7B-Instruct, VoidPadding improves the block-size-averaged four-task mean across mathematical reasoning and code generation benchmarks by \(+17.84\) points over the original model and \(+6.95\) points over RainbowPadding, while reducing decoding NFE by 55.7\% on average. Code is available at https://github.com/Haru-LCY/VoidPadding.
Original Article
View Cached Full Text

Cached at: 06/17/26, 05:42 AM

# VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination
Source: [https://arxiv.org/abs/2606.17999](https://arxiv.org/abs/2606.17999)
[View PDF](https://arxiv.org/pdf/2606.17999)

> Abstract:MDLMs generate text by denoising a preallocated masked response canvas, making response\-length modeling central to instruction tuning\. Existing MDLMs often inherit the autoregressive convention of using repeated \\texttt\{\[EOS\]\} tokens for padding during instruction tuning, giving \\texttt\{\[EOS\]\} a dual role as both a semantic terminator and a padding token\. We show that this dual role is a root cause of \\texttt\{\[EOS\]\} overflow under large\-block decoding\. To decouple these roles, we propose VoidPadding, which introduces \\texttt\{\[VOID\]\} for padding and reserves \\texttt\{\[EOS\]\} for termination\. During inference, the learned \\texttt\{\[EOS\]\} signal enables early stopping, while the learned \\texttt\{\[VOID\]\} signal guides adaptive response canvas expansion\. On Dream\-7B\-Instruct, VoidPadding improves the block\-size\-averaged four\-task mean across mathematical reasoning and code generation benchmarks by \\\(\+17\.84\\\) points over the original model and \\\(\+6\.95\\\) points over RainbowPadding, while reducing decoding NFE by 55\.7\\% on average\. Code is available at[this https URL](https://github.com/Haru-LCY/VoidPadding)\.

## Submission history

From: Chunyu Liu \[[view email](https://arxiv.org/show-email/63c8aad2/2606.17999)\] **\[v1\]**Tue, 16 Jun 2026 14:46:53 UTC \(2,532 KB\)

Similar Articles

Supportive Token Revealing for Fast Diffusion Language Model Decoding

arXiv cs.CL

This paper proposes AXON, a training-free module that improves the quality-latency trade-off of discrete diffusion language model decoding by intelligently selecting 'anchor' tokens to reveal first, using attention, uncertainty, and confidence signals to support subsequent denoising steps. Experiments on reasoning and code-generation benchmarks show AXON reduces function evaluations while maintaining or improving accuracy.