EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models

arXiv cs.CL 06/02/26, 04:00 AM Papers
Summary
This paper presents EPIC, an efficient framework for context-free grammar constrained decoding in diffusion language models that reduces inference time by up to 67.5% while maintaining syntactic correctness.
arXiv:2606.00722v1 Announce Type: new Abstract: Controlling language model outputs is essential for ensuring structural validity, reliability, and downstream usability, and diffusion language models are no exception. Recent advances in diffusion language model decoding have extended output control beyond regular constraints to context-free grammar (CFG) constraints. Existing methods, however, can be up to four times slower than unconstrained decoding. More importantly, they substantially diminish one of the key advantages of diffusion language models over autoregressive models, namely parallel decoding. This slowdown arises because sequential validity checking introduces significant overhead during parallel generation. We propose an efficient CFG-constrained decoding framework, EPIC, that addresses this limitation. Our method improves decoding efficiency by combining lexing memoization, validation using Earley-style parsing instead of deterministic automata, and relaxed compatible subset selection for parallel commit. It reduces repeated lexing and validation overhead while allowing multiple compatible tokens to be committed together. Experiments on three benchmarks using four models show that our method reduces inference time by up to 67.5% and decreases the additional overhead by up to 90.5% compared with existing CFG-constrained decoding methods. Our implementation is available at https://github.com/hyundong98/EPIC-Decoding.git .
Original Article
View Cached Full Text
Cached at: 06/02/26, 03:38 PM
# EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models
Source: [https://arxiv.org/html/2606.00722](https://arxiv.org/html/2606.00722)
Hyundong Jin Yo\-Sub Han⋆ Yonsei University, Seoul, Republic of Korea \{[tuzi04](https://arxiv.org/html/2606.00722v1/mailto:[email protected]),[emmous](https://arxiv.org/html/2606.00722v1/mailto:[email protected])\}@yonsei\.ac\.kr

###### Abstract

Controlling language model outputs is essential for ensuring structural validity, reliability, and downstream usability, and diffusion language models are no exception\. Recent advances in diffusion language model decoding have extended output control beyond regular constraints to context\-free grammar \(CFG\) constraints\. Existing methods, however, can be up to four times slower than unconstrained decoding\. More importantly, they substantially diminish one of the key advantages of diffusion language models over autoregressive models, namely parallel decoding\. This slowdown arises because sequential validity checking introduces significant overhead during parallel generation\. We propose an efficient CFG\-constrained decoding framework, EPIC, that addresses this limitation\. Our method improves decoding efficiency by combining lexing memoization, validation using Earley\-style parsing instead of deterministic automata, and relaxed compatible subset selection for parallel commit\. It reduces repeated lexing and validation overhead while allowing multiple compatible tokens to be committed together\. Experiments on three benchmarks using four models show that our method preserves the syntactic and functional correctness of CFG\-constrained decoding while keeping the overall runtime close to unconstrained decoding\. Compared with existing CFG\-constrained decoding methods, EPIC reduces relative inference time by up to67\.5%67\.5\\%\. Our implementation is available at[https://github\.com/hyundong98/EPIC\-Decoding\.git](https://github.com/hyundong98/EPIC-Decoding.git)\.

EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models

Hyundong Jin Yo\-Sub Han⋆Yonsei University, Seoul, Republic of Korea\{[tuzi04](https://arxiv.org/html/2606.00722v1/mailto:[email protected]),[emmous](https://arxiv.org/html/2606.00722v1/mailto:[email protected])\}@yonsei\.ac\.kr

## 1Introduction

Driven by the success of diffusion models in vision, there has been growing interest in applying diffusion\-based methods to language modeling\(Hoet al\.,[2020](https://arxiv.org/html/2606.00722#bib.bib23); Sahooet al\.,[2024](https://arxiv.org/html/2606.00722#bib.bib1); Nieet al\.,[2026](https://arxiv.org/html/2606.00722#bib.bib2)\)\. Diffusion language models \(DLMs\) generate text through iterative denoising rather than autoregressive \(AR\) next\-token prediction, and masked diffusion language models \(MDLMs\) instantiate this paradigm by repeatedly refining partially masked sequences\. Recent progress has shown that MDLMs can scale to large language models \(LLMs\), making DLMs a promising alternative to AR generation\(Sahooet al\.,[2024](https://arxiv.org/html/2606.00722#bib.bib1); Nieet al\.,[2026](https://arxiv.org/html/2606.00722#bib.bib2)\)\. Unlike AR models, which generate tokens strictly from left to right, DLMs can update multiple positions in parallel, offering a distinctive latency advantage\(Kimet al\.,[2026](https://arxiv.org/html/2606.00722#bib.bib22)\)\.

![Refer to caption](https://arxiv.org/html/2606.00722v1/x1.png)Figure 1:Overview of the bottlenecks in prior CFG\-constrained diffusion decoding and the corresponding components of EPIC\.Despite this advantage, unconstrained DLM decoding remains less reliable for structured outputs such as code, JSON, and molecular strings\. Proposed tokens can violate inter\-token dependencies and degrade sequence\-level consistency since multiple positions may be filled independently\(Kanget al\.,[2026](https://arxiv.org/html/2606.00722#bib.bib21); Kimet al\.,[2026](https://arxiv.org/html/2606.00722#bib.bib22)\)\. Furthermore, the nonsequential nature of DLM generation complicates syntax checking during decoding\. Unlike autoregressive decoding, DLM decoding must reason about partially filled sequences whose unresolved masks may appear between generated tokens\. These challenges motivate grammar\-constrained decoding for DLMs\.

Recent work has begun to develop constrained decoding for DLMs motivated by these challenges\.Sureshet al\.\([2025](https://arxiv.org/html/2606.00722#bib.bib3)\)study constrained DLM decoding under regular constraints, where validity can be represented with finite automata\. Building on this direction,Mündleret al\.\([2026](https://arxiv.org/html/2606.00722#bib.bib4)\)extend the constraints to context\-free grammars \(CFGs\) using a rejection sampling decoder, which we use as the state\-of\-the\-art baseline\. The baseline checks whether the partial output remains completable under the target grammar after applying a candidate update\. However, each check requires repeated lexing, deterministic finite automaton \(DFA\) construction and minimization, and sequential CFG\-based validation, introducing substantial overhead\. This overhead limits the benefit of parallel generation, especially when fewer denoising steps are used and more tokens are proposed per step\.

We propose EPIC, a decoding framework that reduces the main sources of overhead in the prior pipeline, as illustrated in Figure[1](https://arxiv.org/html/2606.00722#S1.F1)\. EPIC reuses lexical computations across similar partial outputs, checks CFG compatibility directly on the lexed graph without deterministic finite automaton \(DFA\) construction, and selects compatible candidate subsets before exact verification to recover parallel commitment\. Together, these components reduce repeated lexing, DFA construction, and sequential validation overhead, making CFG\-constrained decoding more efficient for DLMs\.

Our main contributions are as follows\.

- •We identify three main sources of overhead in prior work on CFG\-constrained decoding for DLMs, namely repeated lexing, DFA construction, and sequential candidate validation\.
- •We propose EPIC, a CFG\-constrained decoding framework for DLMs that targets the main overheads of prior work through lexing memoization, DFA\-free validation, and relaxed compatible subset selection for parallel commit\.
- •We establish the correctness of the proposed decoding procedure and evaluate its inference efficiency on structured generation benchmarks covering code, JSON, and SMILES\.

## 2Related Work

We assume basic familiarity with regular languages, finite automata, context\-free grammars \(CFGs\), lexing, and CFG parsing, and refer to Appendix[A](https://arxiv.org/html/2606.00722#A1)for formal background\. Since EPIC directly targets the computational bottlenecks of CFG\-constrained decoding for diffusion language models, we focus this section on the closest line of work and defer broader discussions of diffusion language models, autoregressive constrained decoding, and formal\-language\-based generation to Appendix[B](https://arxiv.org/html/2606.00722#A2)\.

The most relevant prior work isMündleret al\.\([2026](https://arxiv.org/html/2606.00722#bib.bib4)\), which represents the state\-of\-the\-art CFG\-constrained decoding method for DLMs\. They formulate CFG\-constrained decoding as a repeated completion test over partial outputs\. Given a proposed update to a partial output with masks, their decoder checks whether the remaining masks can still be filled so that the final output is accepted by the target grammar\. The decoder accepts and commits the update when this test succeeds and rejects it otherwise\. They implemented the test by constructing an automaton representing possible completions of the updated partial output and checking its compatibility with the CFG\. For outputs that were not completed within the rejection budget, their method used an infilling procedure to produce a syntactically valid completion\. We adopt the same completable output criterion and focus on making it more efficient\.

This prior pipeline involves three main sources of overhead\. First, language models produce token\-level outputs, whereas the CFG is defined over grammar terminals, namely lexemes\. The decoder therefore needs a lexing step before grammar checking\. This step is nontrivial since partial outputs may contain masks\. A generated token can complete a lexeme on its left, start a lexeme on its right, or participate in a lexeme that spans one or more masks\. Prior work handled these ambiguities by constructing an NFA for possible lexeme sequences, but it still had to be repeated for each update, making it a major runtime bottleneck\.

Second, the automaton derived from the partial output is determinized and minimized before CFG validation\. This repeated NFA\-to\-DFA conversion can be costly, requiring exponential time in the number of NFA states in the worst case\. Moreover, the subsequent compatibility check with the CFG further increases validation overhead\. Prior work checked whether the DFA admits at least one completion consistent with the CFG by searching an intersection grammar\. Although the authors computed this grammar on the fly, the underlying construction could still grow cubically in the number of DFA states, making the validation step expensive\.

Lastly, the overall procedure reduces the benefit of diffusion parallelism because candidate tokens are checked and committed one at a time\. Thus, even when a DLM proposes multiple tokens in a single denoising step, the constrained decoder may still incur largely sequential validation costs\.

## 3Problem Formulation

LetG=\(V,Σlex,P,S\)G=\(V,\\Sigma\_\{\\mathrm\{lex\}\},P,S\)be a context\-free grammar, whereΣlex\\Sigma\_\{\\mathrm\{lex\}\}is the lexeme vocabulary, the set of grammar terminals\. The target constraint language is therefore

L\(G\)⊆Σlex∗\.L\(G\)\\subseteq\\Sigma\_\{\\mathrm\{lex\}\}^\{\*\}\.This lexeme vocabulary should not be confused with either the byte\-level alphabetΣbyte\\Sigma\_\{\\mathrm\{byte\}\}, used to represent raw strings, or the tokenizer vocabularyΓ\\Gammaused by the diffusion language model\. A partial model output is represented as

x∈\(Γ∪\{\[MASK\]\}\)n,x\\in\(\\Gamma\\cup\\\{\\texttt\{\[MASK\]\}\\\}\)^\{n\},where\[MASK\]denotes a mask\.

Because the model operates over tokenizer\-level sequences whereas the CFG is defined over lexeme sequences, validity checking requires a lexical interface\. We letΛ\\Lambdadenote the lexical map that converts a token\-level partial outputxxinto a regular or graph representation of compatible lexeme sequences\. We denote this representation by𝒜x\\mathcal\{A\}\_\{x\}, with

L\(𝒜x\)⊆Σlex∗\.L\(\\mathcal\{A\}\_\{x\}\)\\subseteq\\Sigma\_\{\\mathrm\{lex\}\}^\{\*\}\.For notational convenience, we write

Cx:=L\(𝒜x\)C\_\{x\}:=L\(\\mathcal\{A\}\_\{x\}\)for the lexeme\-sequence completion language induced byxx, and

M\(x\):=\{i∈\[n\]∣xi=\[MASK\]\}M\(x\):=\\\{i\\in\[n\]\\mid x\_\{i\}=\\texttt\{\[MASK\]\}\\\}for the set of masked positions\.

At each diffusion step, the model predicts distributions over tokens for masked positions\. Given a reveal budgetkk, the decoder selects a subset of masked positions

I=\{i1,…,ik\}⊆M\(x\)I=\\\{i\_\{1\},\\ldots,i\_\{k\}\\\}\\subseteq M\(x\)and samples tokenizer tokens for these positions in parallel,

ti∼pθ\(⋅∣x\),i∈I\.t\_\{i\}\\sim p\_\{\\theta\}\(\\cdot\\mid x\),\\quad i\\in I\.Equivalently, the model proposes updates

Δ=\{\(i,ti\)∣i∈I\}\.\\Delta=\\\{\(i,t\_\{i\}\)\\mid i\\in I\\\}\.Applying this update toxxgives a new partial output defined position\-wise by

\(x⊕Δ\)j=\{tjif\(j,tj\)∈Δ,xjotherwise\.\(x\\oplus\\Delta\)\_\{j\}=\\begin\{cases\}t\_\{j\}&\\text\{if \}\(j,t\_\{j\}\)\\in\\Delta,\\\\ x\_\{j\}&\\text\{otherwise\}\.\\end\{cases\}
The central question in CFG\-constrained DLM decoding is whether this token\-level update preserves the possibility of completing the remaining masks into a lexeme sequence accepted by the grammar\. We say that a partial outputxxis completable with respect toGGif

L\(G\)∩Cx≠∅\.L\(G\)\\cap C\_\{x\}\\neq\\emptyset\.A proposed updateΔ\\Deltais valid when

ValidG\(x,Δ\)=𝟏\[L\(G\)∩Cx⊕Δ≠∅\]\.\\textsc\{Valid\}\_\{G\}\(x,\\Delta\)=\\mathbf\{1\}\\left\[L\(G\)\\cap C\_\{x\\oplus\\Delta\}\\neq\\emptyset\\right\]\.
This definition gives a rule for constrained diffusion decoding\. IfValidG\(x,Δ\)=1\\textsc\{Valid\}\_\{G\}\(x,\\Delta\)=1, the decoder may commit the proposed tokens\. Otherwise, the proposal must be rejected\. Prior methods evaluate this condition by constructing an automaton forCx⊕ΔC\_\{x\\oplus\\Delta\}and checking its compatibility with the CFG\. Our goal is to compute the same validity predicate more efficiently in the repeated and parallel setting induced by DLM decoding\.

The multi\-token nature ofΔ\\Deltais crucial\. In unconstrained DLM decoding, all tokens inIIcan be committed simultaneously after one model forward pass\. Under CFG constraints, however, a naive decoder must test the proposed tokens sequentially,

\(i1,ti1\),\(i2,ti2\),…,\(ik,tik\),\(i\_\{1\},t\_\{i\_\{1\}\}\),\(i\_\{2\},t\_\{i\_\{2\}\}\),\\ldots,\(i\_\{k\},t\_\{i\_\{k\}\}\),because the validity of one token may depend on which other tokens have already been accepted\. This sequential processing weakens the main advantage of DLMs\. The problem we address is therefore efficient validity checking for multiple token updates, while preserving the exact completable output criterion over lexeme sequences\.

## 4Proposed Method

We proposeEPIC, a decoding framework forEfficient andParallelInference under CFGConstraints\. Our method consists of three main components, namely lexing memoization, DFA\-free validation with Earley\-style parsing, and relaxed compatible subset selection for parallel commit\. Each component addresses a distinct source of overhead in prior work\. An overview of our framework is illustrated in Figure[2](https://arxiv.org/html/2606.00722#S4.F2)\.

![Refer to caption](https://arxiv.org/html/2606.00722v1/x2.png)Figure 2:Overview of EPIC\. EPIC combines lexing memoization, a DFA\-free graph\-parser validation, and regular\-cover batch selection to reduce the overhead in the CFG\-constrained diffusion decoding\.### 4\.1Lexing Memoization

The first improvement exploits locality in lexing\. Although masks introduce lexing ambiguity, each ambiguity is determined by a local text segment and its adjacent mask boundaries\. During diffusion decoding, an update modifies only a small part of the sequence, while the remaining mask positions stay fixed\. As a result, consecutive validation queries often contain many unchanged local lexing contexts\. We therefore memoize possible prefix and suffix lexings for segments adjacent to masks\. When the same local context appears again, we reuse the stored lexing result during partial\-language construction\. This avoids repeated lexing of unchanged regions and reduces the cost of constructing the partial\-output automaton or graph\.

### 4\.2DFA\-free Validation with Earley\-style Parsing

The next component is DFA\-free validation\. Prior work represented a lexed partial output as an epsilon\-NFA \(ENFA\) and then determinized and minimized it before CFG validation\. After this conversion, validation checked whether the automaton admitted some path whose sequence of lexemes could be generated by the CFG\. Prior work implemented this search as dynamic programming over CFG nonterminals and automaton pairs, analogous to CYK\-style recognition\(Younger,[1967](https://arxiv.org/html/2606.00722#bib.bib15); Kasami,[1965](https://arxiv.org/html/2606.00722#bib.bib16)\)on a finite automaton\. However, this pipeline incurs the cost of ENFA\-to\-DFA conversion at each decoding step, and this bottom\-up formulation provides limited top\-down guidance during the search\.

In contrast, for exact validation, we avoid ENFA\-to\-DFA conversion by performing witness checking directly on the ENFA using an Earley\-style graph parser\. The parser combines bottom\-up evidence from ENFA transitions with top\-down predictions from the CFG\. Intuitively, the search meets in the middle\. The ENFA provides possible lexeme transitions, while the grammar restricts which partial derivations are relevant to the start symbol\. Although the worst case complexity is not substantially different from that of a CYK\-style implementation, the procedure avoids repeated determinization and minimization and introduces top\-down grammar guidance into the witness search\. Thus, DFA\-free validation preserves the completable output criterion while removing the per\-step overhead of ENFA\-to\-DFA conversion\.

### 4\.3Relaxed Compatible Subset Selection for Parallel Commit

Finally, we introduce relaxed compatible subset selection to address the sequential commit problem, which significantly limits the parallelism of DLMs\. When the number of denoising steps is small, a DLM reveals many tokens in a single step\. This is a key advantage of DLMs over AR language models, but it can also make structured generation more difficult because independently proposed tokens may be mutually incompatible\. Prior work performs sequential compatibility checking to ensure compatibility among independently sampled tokens\. As a result, when the number of denoising steps is small, existing methods incur a larger slowdown relative to unconstrained decoding\. The most direct solution would be to check the validity of multiple tokens simultaneously\. This approach, however, does not reveal which tokens are compatible when the check fails\. Finding the truly compatible subset would require a number of checks exponential in the number of tokens masked at the same time\.

We avoid this cost by first selecting a subset that is compatible under a relaxed condition\. Our implementation builds a regular cover by flattening the production rules of the given CFG\. For a CFGGG, a regular cover is a finite automatonℱG\\mathcal\{F\}\_\{G\}that satisfiesL\(G\)⊆L\(ℱG\)L\(G\)\\subseteq L\(\\mathcal\{F\}\_\{G\}\)\. Given a candidate batchCC, we intersect this regular cover with the completion automaton induced byx⊕Cx\\oplus Cand perform an emptiness check\. This relaxed check is cheaper than exact CFG validation and safely rejects candidate subsets that are incompatible even under the relaxed condition\. If the full batch is not cover\-compatible, we recursively split the candidates by confidence and recover a smaller relaxed\-compatible subset\. A relaxed\-compatible subset is not committed immediately\. It is subsequently verified by the exact CFG completability checker\. If exact verification rejects the selected subset, we recursively shrink it again and then fill the remaining reveal budget with the standard sequential constrained sampler\. Detailed pseudocode for the three main components of EPIC are provided in Appendix[C](https://arxiv.org/html/2606.00722#A3), and Appendix[D](https://arxiv.org/html/2606.00722#A4)shows that these components preserve the same completable\-output criterion used by the baseline decoder\.

### 4\.4Failure recovery

Following prior work, we use the same automatic recovery procedure when decoding reaches a fixed resampling budget\. This budget counts the number of rejected proposals over the entire decoding process\. Even when the budget is exhausted, the current partial output is known to be completable, since every accepted update has been verified with a witness\. Therefore, a valid witness completion always exists for the remaining masks\. Under the recovery procedure of prior work, model sampling is terminated early and the output is completed by extracting a valid completion from a witness\. This recovery step is not sampled from the model distribution, but provides a valid fallback when further rejection sampling is unlikely to succeed\.

## 5Experimental Results and Analysis

We evaluate EPIC to show that it reduces the inference cost of CFG\-constrained decoding for DLMs while maintaining syntactic and functional correctness\. Our experiments follow the evaluation protocol of prior work and cover three structured generation tasks: C\+\+ code generation, JSON generation under instance\-specific schemas, and SMILES generation\. We organize the evaluation around three questions\. First, we compare the relative inference time of EPIC and the prior CFG\-constrained baseline across different denoising\-step settings, using unconstrained decoding as the normalization reference\. Second, we analyze where runtime is spent and how each component of EPIC reduces the runtime cost\. Third, we perform an ablation study over lexing memoization, DFA\-free graph validity checking, and relaxed compatible subset selection to isolate their individual and combined effects\. We provide additional experimental details in Appendix[E](https://arxiv.org/html/2606.00722#A5)and additional results in Appendix[F](https://arxiv.org/html/2606.00722#A6)\.

ModelStepsC\+\+JSONSMILESAverageCon\.EPICCon\.EPICCon\.EPICCon\.EPICDream\(7B\)16143\.86\\cellcolorgray\!1598\.44176\.32\\cellcolorgray\!15109\.36102\.09\\cellcolorgray\!15103\.43140\.76\\cellcolorgray\!15103\.7432147\.89\\cellcolorgray\!15114\.85123\.79\\cellcolorgray\!15101\.22100\.45\\cellcolorgray\!15101\.07124\.04\\cellcolorgray\!15105\.7164114\.41\\cellcolorgray\!15108\.12108\.19\\cellcolorgray\!15101\.36100\.67\\cellcolorgray\!15101\.18107\.76\\cellcolorgray\!15103\.55128106\.03\\cellcolorgray\!15102\.65104\.42\\cellcolorgray\!15100\.92100\.30\\cellcolorgray\!15100\.29103\.58\\cellcolorgray\!15101\.29256102\.12\\cellcolorgray\!15102\.92102\.39\\cellcolorgray\!15100\.8298\.96\\cellcolorgray\!1599\.21101\.16\\cellcolorgray\!15100\.98DreamCoder\(7B\)16388\.60\\cellcolorgray\!15127\.34178\.21\\cellcolorgray\!1589\.19101\.14\\cellcolorgray\!15103\.14222\.65\\cellcolorgray\!15106\.5632269\.76\\cellcolorgray\!15140\.51132\.63\\cellcolorgray\!1589\.52100\.72\\cellcolorgray\!15101\.34167\.70\\cellcolorgray\!15110\.4664182\.52\\cellcolorgray\!15128\.04118\.02\\cellcolorgray\!1596\.74100\.41\\cellcolorgray\!15100\.65133\.65\\cellcolorgray\!15108\.48128133\.57\\cellcolorgray\!15110\.46106\.71\\cellcolorgray\!1598\.21100\.25\\cellcolorgray\!15100\.26113\.51\\cellcolorgray\!15102\.98256117\.23\\cellcolorgray\!15106\.50101\.26\\cellcolorgray\!1598\.84100\.03\\cellcolorgray\!15100\.01106\.17\\cellcolorgray\!15101\.78LLaDA\(8B\)16112\.19\\cellcolorgray\!1574\.84123\.15\\cellcolorgray\!1574\.90112\.40\\cellcolorgray\!1594\.95115\.91\\cellcolorgray\!1581\.563295\.49\\cellcolorgray\!1564\.07111\.99\\cellcolorgray\!1577\.51102\.77\\cellcolorgray\!1592\.22103\.42\\cellcolorgray\!1577\.936493\.26\\cellcolorgray\!1576\.78104\.67\\cellcolorgray\!1587\.5598\.76\\cellcolorgray\!1595\.4998\.90\\cellcolorgray\!1586\.61128100\.52\\cellcolorgray\!1593\.27102\.42\\cellcolorgray\!1595\.2198\.17\\cellcolorgray\!1596\.03100\.37\\cellcolorgray\!1594\.84256102\.30\\cellcolorgray\!15100\.71100\.09\\cellcolorgray\!1598\.2598\.94\\cellcolorgray\!1598\.55100\.44\\cellcolorgray\!1599\.17DiffuCoder\(7B\)16187\.50\\cellcolorgray\!15108\.07165\.58\\cellcolorgray\!15108\.72103\.93\\cellcolorgray\!15104\.96152\.34\\cellcolorgray\!15107\.2532182\.20\\cellcolorgray\!15123\.18125\.93\\cellcolorgray\!1596\.37103\.93\\cellcolorgray\!15102\.58137\.35\\cellcolorgray\!15107\.3864138\.87\\cellcolorgray\!15102\.33111\.23\\cellcolorgray\!1597\.45103\.36\\cellcolorgray\!15101\.01117\.82\\cellcolorgray\!15100\.26128121\.15\\cellcolorgray\!15103\.98102\.38\\cellcolorgray\!1596\.46108\.05\\cellcolorgray\!15104\.62110\.53\\cellcolorgray\!15101\.69256118\.54\\cellcolorgray\!15107\.8095\.17\\cellcolorgray\!1592\.97103\.17\\cellcolorgray\!15102\.10105\.63\\cellcolorgray\!15100\.96Average147\.90\\cellcolorgray\!15104\.74119\.73\\cellcolorgray\!1595\.58101\.93\\cellcolorgray\!15100\.15123\.18\\cellcolorgray\!15100\.16Table 1:Inference time relative to unconstrained decoding\. For each method, we report the relative inference time normalized by unconstrained diffusion decoding, where100%100\\%indicates the same runtime as unconstrained decoding\. Con\. denotes the prior CFG\-constrained decoder, and EPIC denotes our method\. The shaded column highlights EPIC\.### 5\.1Experimental Settings

We evaluate the same four instruction\-tuned DLMs as prior work, namely Dream, DreamCoder, LLaDA, and DiffuCoder\. The models include both general\-purpose DLMs and code\-oriented variants\. Following prior CFG\-constrained decoding work, we use three structured\-output benchmarks, C\+\+ code generation, JSON generation with instance\-specific schemas, and SMILES generation\. The C\+\+ task is based on the HumanEval\-X C\+\+ split\(Zhenget al\.,[2023](https://arxiv.org/html/2606.00722#bib.bib18)\), while the JSON and SMILES tasks use the benchmarks from the prior evaluation suite\(Mündleret al\.,[2026](https://arxiv.org/html/2606.00722#bib.bib4)\)\. We primarily compare EPIC with the state\-of\-the\-art CFG\-constrained baseline, using unconstrained diffusion decoding as the reference for normalizing inference time\.*Con\.*denotes the prior CFG\-constrained decoder, and*EPIC*denotes our constrained decoding framework\. All constrained methods use the same task\-specific grammars and lexical specifications\.

For the main comparison, we measure inference time across denoising steps in\{16,32,64,128,256\}\\\{16,32,64,\\allowbreak 128,256\\\}with a maximum generation length of256256tokens, and report the results as relative inference time normalized by unconstrained decoding\. Smaller denoising step settings reveal more tokens per step and therefore place greater stress on the sequential validity checking bottleneck\. We use greedy decoding and repeat runtime measurements with three seeds,\{42,43,44\}\\\{42,43,44\\\}\. Since EPIC achieves comparable syntactic and functional correctness to the CFG\-constrained baseline, we defer the full correctness results to Appendix[F](https://arxiv.org/html/2606.00722#A6)\.

### 5\.2Main Results

Table[1](https://arxiv.org/html/2606.00722#S5.T1)presents our main experimental results\. We report relative inference time for four models and three datasets, normalized by unconstrained decoding\. Thus, a value of100100indicates the same runtime as unconstrained decoding, while smaller values indicate faster decoding\. Across C\+\+ and JSON, EPIC consistently outperforms the prior CFG\-constrained baseline for all models and denoising steps\. In several cases, EPIC achieves relative inference time below100100\. This does not imply negative constraint checking cost\. Rather, constrained decoding can terminate early once a complete valid output and EOS have been formed, whereas the unconstrained diffusion sampler executes the full configured schedule\. This effect is most visible on JSON and LLaDA\. The improvement of EPIC is especially pronounced on C\+\+, where the grammar is more complex than those used for JSON and SMILES and requires more production rules and automaton states\.

On SMILES, the relative\-time difference between the baseline and EPIC is generally small, except for LLaDA\. This is because the fixed CFG and lexical specification used for SMILES are much smaller than those for the other tasks, so the baseline already stays close to unconstrained decoding\. As a result, there is less room for our optimizations to reduce per\-check cost\. The LLaDA case is different because of its block\-wise decoding schedule\. LLaDA divides the canvas into blocks and fills a fixed number of masks within each block, whereas DREAM, DreamCoder, and DiffuCoder determine the number and positions of unmasked tokens dynamically from a global timestep schedule\. This makes the candidate sets of LLaDA more localized and stable, which can make compatible subset selection more effective\. Thus, even though graph parsing and caching provide limited gains on SMILES, LLaDA can still benefit from a reduction in sequential exact checks\. We provide further analysis in Section[5\.3](https://arxiv.org/html/2606.00722#S5.SS3)\.

The benefit of EPIC is larger when the number of denoising steps is small\. This matches our motivation, since smaller step counts require the decoder to unmask more tokens per step and therefore amplify the sequential validation bottleneck\. The effect is particularly clear for DreamCoder\. For example, on DreamCoder with C\+\+ at1616denoising steps, the prior CFG\-constrained baseline requires388\.60%388\.60\\%of the unconstrained decoding time, whereas EPIC requires only127\.34%127\.34\\%\. Equivalently, the overhead beyond unconstrained decoding is reduced from288\.60%288\.60\\%to27\.34%27\.34\\%\.

ModelDatasetMethodRuntime BreakdownLex\. \(s\)DFA \(s\)Val\. \(s\)DreamC\+\+Con\.50\.847\.249\.8EPIC12\.011\.817\.1JSONCon\.24\.427\.12\.0EPIC5\.95\.62\.1SMILESCon\.0\.31\.00\.7EPIC0\.10\.20\.5DreamCoderC\+\+Con\.96\.9216\.8179\.8EPIC33\.468\.536\.7JSONCon\.17\.331\.01\.7EPIC4\.36\.41\.8SMILESCon\.0\.30\.30\.1EPIC0\.10\.00\.4LLaDAC\+\+Con\.47\.120\.056\.6EPIC21\.63\.711\.6JSONCon\.21\.77\.51\.0EPIC4\.21\.00\.9SMILESCon\.2\.63\.033\.1EPIC1\.50\.47\.1DiffuCoderC\+\+Con\.83\.4102\.8146\.2EPIC18\.318\.125\.0JSONCon\.26\.240\.42\.5EPIC5\.38\.22\.6SMILESCon\.1\.21\.818\.4EPIC0\.20\.21\.6Table 2:Runtime breakdown of the baseline and EPIC\. We report the time spent in lexing, ENFA\-to\-DFA conversion, and validation\. Validation includes both CFG checks and regular cover checks\. For graph parser variants, DFA time comes from regular cover selection\.Although Table[1](https://arxiv.org/html/2606.00722#S5.T1)focuses on runtime, EPIC also preserves the correctness\. Across all models, datasets, and denoising\-step settings, syntactic and functional correctness remain comparable to the baseline\. Detailed correctness results are provided in Appendix[F\.2](https://arxiv.org/html/2606.00722#A6.SS2)\.

ModelDatasetMemoizationParallelism\# HitRate\# Tok\.RateDreamC\+\+300,01967\.23\.497\.4JSON970,45888\.92\.793\.9SMILES31,44243\.32\.297\.1DreamCoderC\+\+141,12875\.23\.793\.5JSON757,25795\.34\.995\.0SMILES5,11644\.51\.991\.6LLaDAC\+\+429,34553\.15\.194\.5JSON1,178,32683\.57\.097\.1SMILES4,59046\.53\.687\.8DiffuCoderC\+\+62,34772\.33\.790\.9JSON269,78290\.93\.995\.2SMILES13,88465\.02\.992\.1Table 3:Component analysis of EPIC\. Memoization reports reused lexing results and hit rate, while parallelism reports the average batch commit size and regular cover selection success rate\.
### 5\.3Further Analysis

We further analyze EPIC to verify that its empirical gains come from the intended sources of efficiency improvement\. All results in this subsection are measured at3232denoising steps\. We first examine the runtime breakdown in Table[2](https://arxiv.org/html/2606.00722#S5.T2)\. The table compares the baseline and EPIC in terms of lexing time, DFA\-construction time, and validation time\. EPIC reduces overhead in almost all components\. Lexing memoization yields roughly a2×2\\timesto4×4\\timesreduction in lexing time, with the largest gains appearing on C\+\+ and JSON, where lexing accounts for a substantial fraction of the baseline overhead\. The DFA\-construction time also decreases substantially\. Although EPIC does not construct a DFA for exact validation of each partial output, it still constructs automata during relaxed compatible subset selection\. This cost remains small because regular\-cover checks are used only for parallel commit, while the sequential fallback runs without DFA construction\. Validation time is also reduced, especially on C\+\+ and LLaDA\. C\+\+ benefits from DFA\-free validation because its grammar is more complex than the grammars used for JSON and SMILES\. LLaDA benefits more strongly from validation reduction since its fixed\-number unmasking policy makes parallel commit more stable than in the other models\.

We next report component\-level statistics of EPIC in Table[3](https://arxiv.org/html/2606.00722#S5.T3)\. Memoization achieves hit rates above50%50\\%in most settings, and C\+\+ and JSON show particularly large numbers of reused lexing results\. These high hit rates indicate that diffusion decoding repeatedly visits similar local lexing contexts and that the memoization table captures meaningful reuse\. The parallelism statistics show that about90%90\\%of regular cover compatible subsets pass the exact CFG validation step\. They also show that EPIC commits more than two tokens at once on average across all models and datasets\. The effect is most pronounced for LLaDA\. Unlike the other models, LLaDA reveals a fixed number of tokens in each decoding block, which provides more stable candidates for relaxed compatible subset selection\. On JSON, LLaDA commits an average of7\.07\.0tokens in parallel, demonstrating that EPIC can recover substantial diffusion parallelism under CFG constraints\.

ModelMethodC\+\+JSONSMILESDreamCon\.90012035161Memo\.84310955412DFA\-free86312375403Parallel9001162545\\cellcolorgray\!15EPIC\\cellcolorgray\!15700\\cellcolorgray\!15983\\cellcolorgray\!15519DreamCoderCon\.158210895231Memo\.13859395402DFA\-free126111095403Parallel1378986543\\cellcolorgray\!15EPIC\\cellcolorgray\!15824\\cellcolorgray\!15735\\cellcolorgray\!15523LLaDACon\.79115546361Memo\.77214506552DFA\-free73816196283Parallel6011179609\\cellcolorgray\!15EPIC\\cellcolorgray\!15531\\cellcolorgray\!151076\\cellcolorgray\!15570DiffuCoderCon\.127813145841Memo\.115211725982DFA\-free108513075823Parallel11191195601\\cellcolorgray\!15EPIC\\cellcolorgray\!15862\\cellcolorgray\!151005\\cellcolorgray\!15576Table 4:Ablation results for EPIC\. Starting from the baseline \(Con\.\), we separately add1lexing memoization,2DFA\-free validation with Earley\-style parsing, and3relaxed compatible subset selection for parallel commit\. EPIC enables all three components\.
### 5\.4Ablation Results

Table[4](https://arxiv.org/html/2606.00722#S5.T4)shows the ablation results at3232denoising steps\. We evaluate each component of EPIC by adding it separately to the baseline\. The results further confirm the observations from the runtime breakdown analysis\. The components of EPIC provide distinct sources of speedup, and their effects become clearer in the ablation study\. The gains are especially pronounced on C\+\+\. All three components reduce inference time for every model, showing that C\+\+ benefits from memoized lexing, DFA\-free validation, and parallel commit\. This consistent improvement indicates that EPIC is particularly effective when the constraint language is complex enough to make repeated lexing, automaton construction, and exact CFG validation costly\. On JSON, DFA\-free validation alone sometimes increases runtime\. This is likely because JSON grammars are simpler than the C\+\+ grammar, so the baseline intersection\-based validation already incurs relatively small overhead\. In this setting, the additional graph\-parser overhead can outweigh the reduction from avoiding DFA construction\. On SMILES, the gains are limited because the baseline constraint overhead is already small\. The full EPIC configuration generally achieves the best inference time by combining the complementary benefits of the three components\. Pairwise combinations further show incremental improvements over single component variants in most settings, as reported in the full ablation results in Appendix[F\.3](https://arxiv.org/html/2606.00722#A6.SS3)\.

## 6Conclusions

We presented EPIC, an efficient CFG\-constrained decoding framework for diffusion language models\. EPIC reduces the main overheads of prior constrained diffusion decoding through lexing memoization, DFA\-free validation with Earley\-style graph parsing, and relaxed compatible subset selection for parallel commit\. These components preserve the completable output criterion while reducing repeated lexing, avoiding per\-step DFA construction for exact validation, and recovering parallel token commitment\. Experiments on C\+\+, JSON, and SMILES with four DLMs show that EPIC substantially reduces inference time while maintaining correctness comparable to the baseline\. At1616denoising steps, where parallelism matters most, EPIC achieves a3\.1×3\.1\\timesspeedup over the prior CFG\-constrained decoder on DreamCoder for C\+\+ and reduces the relative inference time from388\.6%388\.6\\%to127\.3%127\.3\\%of unconstrained decoding\. Component analysis further shows memoization hit rates above50%50\\%in most settings and an average parallel commit size above two tokens across all models and datasets\. Future work could further reduce overhead through incremental graph parsing, tighter modeling of finite token budgets, and extensions beyond CFGs to lightweight context\-sensitive constraints such as type or schema\-dependent semantic checks\.

## Limitations

Our method still over\-approximates masked regions using a broad completion language such asΣ∗\\Sigma^\{\*\}and remains rejection\-based\. Consequently, under finite token budgets and timeouts, it does not by itself guarantee 100% syntactic correctness\. This limitation is shared with prior work, where a partial output can be theoretically completable but still difficult to finish within the remaining mask budget\. Syntactic correctness also does not guarantee functional correctness\. This limitation is not unique to our method, since constrained decoding only enforces formal\-language validity and does not verify task semantics\. Moreover, the proposed optimizations reduce practical overhead but do not improve the worst\-case asymptotic complexity of CFG\-constrained validity checking\.

## References

- Structured denoising diffusion models in discrete state\-spaces\.InAdvances in Neural Information Processing Systems,External Links:[Link](https://proceedings.neurips.cc/paper/2021/hash/958c530554f78bcd8e97125b70e6973d-Abstract.html)Cited by:[§B\.1](https://arxiv.org/html/2606.00722#A2.SS1.p1.1)\.
- Y\. Bar\-Hillel, M\. A\. Perles, and E\. Shamir \(1961\)On formal properties of simple phrase structure grammars\.Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung14,pp\. 143–172\.Cited by:[§A\.2](https://arxiv.org/html/2606.00722#A1.SS2.p2.1)\.
- L\. Beurer\-Kellner, M\. Fischer, and M\. Vechev \(2024\)Guiding llms the right way: fast, non\-invasive constrained generation\.InProceedings of the 41st International Conference on Machine Learning,Cited by:[§B\.2](https://arxiv.org/html/2606.00722#A2.SS2.p2.1)\.
- J\. Earley \(1970\)An efficient context\-free parsing algorithm\.Communications of the ACM13\(2\),pp\. 94–102\.External Links:[Document](https://dx.doi.org/10.1145/362007.362035)Cited by:[§A\.3](https://arxiv.org/html/2606.00722#A1.SS3.p3.1)\.
- S\. Geng, M\. Josifoski, M\. Peyrard, and R\. West \(2023\)Grammar\-constrained decoding for structured NLP tasks without finetuning\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,Singapore,pp\. 10932–10952\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.emnlp-main.674),[Link](https://aclanthology.org/2023.emnlp-main.674/)Cited by:[§B\.2](https://arxiv.org/html/2606.00722#A2.SS2.p1.1),[§B\.2](https://arxiv.org/html/2606.00722#A2.SS2.p2.1)\.
- S\. Gong, R\. Zhang, H\. Zheng, J\. Gu, N\. Jaitly, L\. Kong, and Y\. Zhang \(2026\)DiffuCoder: understanding and improving masked diffusion models for code generation\.InThe Fourteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=58NA3unZj5)Cited by:[§E\.3](https://arxiv.org/html/2606.00722#A5.SS3.p1.1)\.
- J\. Ho, A\. Jain, and P\. Abbeel \(2020\)Denoising diffusion probabilistic models\.Advances in neural information processing systems33,pp\. 6840–6851\.Cited by:[§1](https://arxiv.org/html/2606.00722#S1.p1.1)\.
- J\. E\. Hopcroft, R\. Motwani, and J\. D\. Ullman \(2006\)Introduction to automata theory, languages, and computation\.3 edition,Pearson\.Cited by:[§A\.1](https://arxiv.org/html/2606.00722#A1.SS1.p1.7)\.
- J\. E\. Hopcroft \(1971\)Annlog⁡nn\\log nalgorithm for minimizing states in a finite automaton\.Theory of Machines and Computations,pp\. 189–196\.Cited by:[§A\.1](https://arxiv.org/html/2606.00722#A1.SS1.p2.3)\.
- W\. Kang, K\. Galim, S\. Oh, M\. Lee, Y\. Zeng, S\. Zhang, C\. R\. C\. Hooper, Y\. Hu, H\. I\. Koo, N\. I\. Cho, and K\. Lee \(2026\)ParallelBench: understanding the trade\-offs of parallel decoding in diffusion llms\.InInternational Conference on Learning Representations,Cited by:[§1](https://arxiv.org/html/2606.00722#S1.p2.1)\.
- T\. Kasami \(1965\)An efficient recognition and syntax\-analysis algorithm for context\-free languages\.Technical reportTechnical ReportAFCRL\-65\-758,Air Force Cambridge Research Laboratory\.Cited by:[§4\.2](https://arxiv.org/html/2606.00722#S4.SS2.p1.1)\.
- B\. Kim, D\. Jeon, M\. Jeon, and A\. No \(2026\)Dependency\-aware parallel decoding via attention for diffusion llms\.External Links:2603\.12996Cited by:[§B\.1](https://arxiv.org/html/2606.00722#A2.SS1.p2.1),[§1](https://arxiv.org/html/2606.00722#S1.p1.1),[§1](https://arxiv.org/html/2606.00722#S1.p2.1)\.
- T\. Koo, F\. Liu, and L\. He \(2024\)Automata\-based constraints for language model decoding\.InConference on Language Modeling,External Links:[Link](https://openreview.net/forum?id=BDBdblmyzY)Cited by:[§B\.2](https://arxiv.org/html/2606.00722#A2.SS2.p2.1)\.
- A\. Lou, C\. Meng, and S\. Ermon \(2024\)Discrete diffusion modeling by estimating the ratios of the data distribution\.InProceedings of the 41st International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.235,pp\. 32819–32848\.External Links:[Link](https://proceedings.mlr.press/v235/lou24a.html)Cited by:[§B\.1](https://arxiv.org/html/2606.00722#A2.SS1.p1.1)\.
- M\. Mohri and M\. Nederhof \(2000\)Regular approximation of context\-free grammars through transformation\.InRobustness in Language and Speech Technology,pp\. 251–261\.Cited by:[§A\.2](https://arxiv.org/html/2606.00722#A1.SS2.p3.1)\.
- N\. Mündler, J\. Dekoninck, and M\. Vechev \(2026\)Constrained decoding of diffusion LLMs with context\-free grammars\.InThe Fourteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=7Sph4KyeYO)Cited by:[§B\.3](https://arxiv.org/html/2606.00722#A2.SS3.p2.1),[§E\.1](https://arxiv.org/html/2606.00722#A5.SS1.p1.1),[§E\.2](https://arxiv.org/html/2606.00722#A5.SS2.p3.1),[§E\.2](https://arxiv.org/html/2606.00722#A5.SS2.p4.1),[§1](https://arxiv.org/html/2606.00722#S1.p3.1),[§2](https://arxiv.org/html/2606.00722#S2.p2.1),[§5\.1](https://arxiv.org/html/2606.00722#S5.SS1.p1.1)\.
- M\. Nederhof \(2000\)Practical experiments with regular approximation of context\-free languages\.Computational Linguistics26\(1\),pp\. 17–44\.Cited by:[§A\.2](https://arxiv.org/html/2606.00722#A1.SS2.p3.1),[§D\.3](https://arxiv.org/html/2606.00722#A4.SS3.p2.1)\.
- S\. Nie, F\. Zhu, Z\. You, X\. Zhang, J\. Ou, J\. Hu, J\. Zhou, Y\. Lin, J\. Wen, and C\. Li \(2026\)Large language diffusion models\.Advances in Neural Information Processing Systems38,pp\. 50608–50646\.Cited by:[§B\.1](https://arxiv.org/html/2606.00722#A2.SS1.p1.1),[§B\.1](https://arxiv.org/html/2606.00722#A2.SS1.p2.1),[§E\.3](https://arxiv.org/html/2606.00722#A5.SS3.p1.1),[§1](https://arxiv.org/html/2606.00722#S1.p1.1)\.
- NousResearch \(2024\)JSON mode eval\.Note:[https://huggingface\.co/datasets/NousResearch/json\-mode\-eval](https://huggingface.co/datasets/NousResearch/json-mode-eval)Hugging Face datasetCited by:[§E\.2](https://arxiv.org/html/2606.00722#A5.SS2.p3.1)\.
- K\. Park, T\. Zhou, and L\. D’Antoni \(2025\)Flexible and efficient grammar\-constrained decoding\.InProceedings of the 42nd International Conference on Machine Learning,External Links:[Link](https://openreview.net/forum?id=L6CYAzpO1k)Cited by:[§B\.2](https://arxiv.org/html/2606.00722#A2.SS2.p1.1)\.
- F\. C\. N\. Pereira and R\. N\. Wright \(1997\)Finite\-state approximation of phrase\-structure grammars\.InFinite\-State Language Processing,pp\. 149–173\.Cited by:[§A\.2](https://arxiv.org/html/2606.00722#A1.SS2.p3.1)\.
- F\. C\. N\. Pereira \(1991\)Finite\-state approximation of phrase structure grammars\.InProceedings of the 29th Annual Meeting of the Association for Computational Linguistics,pp\. 246–255\.External Links:[Document](https://dx.doi.org/10.3115/981344.981376)Cited by:[§A\.2](https://arxiv.org/html/2606.00722#A1.SS2.p3.1)\.
- S\. S\. Sahoo, M\. Arriola, Y\. Schiff, A\. Gokaslan, E\. Marroquin, J\. T\. Chiu, A\. Rush, and V\. Kuleshov \(2024\)Simple and effective masked diffusion language models\.InAdvances in Neural Information Processing Systems,External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2024/hash/eb0b13cc515724ab8015bc978fdde0ad-Abstract-Conference.html)Cited by:[§B\.1](https://arxiv.org/html/2606.00722#A2.SS1.p1.1),[§B\.1](https://arxiv.org/html/2606.00722#A2.SS1.p2.1),[§1](https://arxiv.org/html/2606.00722#S1.p1.1)\.
- T\. Scholak, N\. Schucher, and D\. Bahdanau \(2021\)PICARD: parsing incrementally for constrained auto\-regressive decoding from language models\.InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,Online and Punta Cana, Dominican Republic,pp\. 9895–9901\.External Links:[Document](https://dx.doi.org/10.18653/v1/2021.emnlp-main.779),[Link](https://aclanthology.org/2021.emnlp-main.779/)Cited by:[§B\.2](https://arxiv.org/html/2606.00722#A2.SS2.p1.1)\.
- M\. Sipser \(2012\)Introduction to the theory of computation\.3 edition,Cengage Learning\.Cited by:[§A\.1](https://arxiv.org/html/2606.00722#A1.SS1.p1.7),[§A\.2](https://arxiv.org/html/2606.00722#A1.SS2.p2.1)\.
- T\. Suresh, D\. Banerjee, S\. Ugare, S\. Misailovic, and G\. Singh \(2025\)DINGO: constrained inference for diffusion LLMs\.InAdvances in Neural Information Processing Systems,External Links:[Link](https://openreview.net/forum?id=KaYMGsnZ4R)Cited by:[§B\.3](https://arxiv.org/html/2606.00722#A2.SS3.p1.1),[§1](https://arxiv.org/html/2606.00722#S1.p3.1)\.
- S\. Ugare, T\. Suresh, H\. Kang, S\. Misailovic, and G\. Singh \(2024\)SynCode: llm generation with grammar augmentation\.Cited by:[§B\.2](https://arxiv.org/html/2606.00722#A2.SS2.p2.1)\.
- B\. T\. Willard and R\. Louf \(2023\)Efficient guided generation for large language models\.External Links:2307\.09702Cited by:[§B\.2](https://arxiv.org/html/2606.00722#A2.SS2.p2.1)\.
- Z\. Xie, J\. Ye, L\. Zheng, J\. Gao, J\. Dong, Z\. Wu, X\. Zhao, S\. Gong, X\. Jiang, Z\. Li, and L\. Kong \(2025\)Dream\-coder 7b\.Note:[https://hkunlp\.github\.io/blog/2025/dream\-coder](https://hkunlp.github.io/blog/2025/dream-coder)Cited by:[§E\.3](https://arxiv.org/html/2606.00722#A5.SS3.p1.1)\.
- J\. Ye, Z\. Xie, L\. Zheng, J\. Gao, Z\. Wu, X\. Jiang, Z\. Li, and L\. Kong \(2025\)Dream 7b\.Note:[https://hkunlp\.github\.io/blog/2025/dream](https://hkunlp.github.io/blog/2025/dream)Cited by:[§E\.3](https://arxiv.org/html/2606.00722#A5.SS3.p1.1)\.
- D\. H\. Younger \(1967\)Recognition and parsing of context\-free languages in timen3n^\{3\}\.Information and Control10\(2\),pp\. 189–208\.External Links:[Document](https://dx.doi.org/10.1016/S0019-9958%2867%2980007-X)Cited by:[§4\.2](https://arxiv.org/html/2606.00722#S4.SS2.p1.1)\.
- Q\. Zheng, X\. Xia, X\. Zou, Y\. Dong, S\. Wang, Y\. Xue, Z\. Wang, L\. Shen, A\. Wang, Y\. Li, T\. Su, Z\. Yang, and J\. Tang \(2023\)CodeGeeX: a pre\-trained model for code generation with multilingual benchmarking on humaneval\-x\.InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,pp\. 5673–5684\.External Links:[Document](https://dx.doi.org/10.1145/3580305.3599790)Cited by:[§E\.2](https://arxiv.org/html/2606.00722#A5.SS2.p2.1),[§5\.1](https://arxiv.org/html/2606.00722#S5.SS1.p1.1)\.

## Appendix APreliminaries

This section reviews the formal language theory and compiler theory background used throughout the paper\. We focus on the concepts necessary to define CFG\-constrained decoding, analyze the computational bottlenecks, and analyze the correctness of our proposed procedures\.

### A\.1Regular Languages and Finite Automata

A regular language is a set of strings that can be recognized by a finite automaton, or described by a regular expression\(Sipser,[2012](https://arxiv.org/html/2606.00722#bib.bib10); Hopcroftet al\.,[2006](https://arxiv.org/html/2606.00722#bib.bib11)\)\. A DFA is defined as a tuple\(Q,Σ,δ,q0,F\)\(Q,\\Sigma,\\delta,q\_\{0\},F\), whereQQis a finite set of states,Σ\\Sigmais a finite alphabet,δ:Q×Σ→Q\\delta:Q\\times\\Sigma\\rightarrow Qis a transition function,q0∈Qq\_\{0\}\\in Qis the initial state, andF⊆QF\\subseteq Qis the set of accepting states\. An NFA generalizes a DFA by allowing multiple outgoing transitions for the same input symbol and may also includeε\\varepsilon\-transitions\. Every NFA can be converted to an equivalent DFA via subset construction, although the resulting DFA may contain exponentially many states in the worst case\(Sipser,[2012](https://arxiv.org/html/2606.00722#bib.bib10); Hopcroftet al\.,[2006](https://arxiv.org/html/2606.00722#bib.bib11)\)\.

As in the prior work, we use NFAs to represent the partial outputs with masked regions\. In prior CFG\-constrained decoding pipelines, this NFA is determinized and then minimized before CFG compatibility checking\. For a DFA with state setQQover alphabetΣ\\Sigma, Hopcroft’s algorithm runs inO\(\|Σ\|\|Q\|log⁡\|Q\|\)O\(\|\\Sigma\|\|Q\|\\log\|Q\|\)time in the worst case\(Hopcroft,[1971](https://arxiv.org/html/2606.00722#bib.bib12)\)\. Since the automaton is reconstructed from many partial outputs during decoding, repeated determinization and minimization become a non\-negligible source of overhead\.

### A\.2Context\-Free Grammars

A CFG is defined as a tupleG=\(V,Σ,P,S\)G=\(V,\\Sigma,P,S\), whereVVis a finite set of nonterminals,Σ\\Sigmais a finite set of terminals,PPis a set of productions of the formA→αA\\to\\alphawithA∈VA\\in Vandα∈\(V∪Σ\)∗\\alpha\\in\(V\\cup\\Sigma\)^\{\*\}, andS∈VS\\in Vis the start symbol\. The language generated byGGis denoted byL\(G\)L\(G\)\. Context\-free languages \(CFLs\) are more expressive than regular languages, and can express recursive structures such as balanced parentheses, nested expressions, and programming languages\.

Prior work uses a closure property that the intersection of a CFL and a regular language is context\-free\(Bar\-Hillelet al\.,[1961](https://arxiv.org/html/2606.00722#bib.bib13); Sipser,[2012](https://arxiv.org/html/2606.00722#bib.bib10)\)\. This property is used to formulate validity checking as an emptiness test over the intersection between the target CFL and the possible completions of the partial output\. In such a construction, the intersection grammar has nonterminals indexed by CFG nonterminals and pairs of automaton states, which introduces a cubic dependence on the number of automaton states\. This leads to a validity\-checking cost ofO\(\|P\|\|Q\|3\+\|P\|\|Σ\|\|Q\|2\)O\\left\(\|P\|\|Q\|^\{3\}\+\|P\|\|\\Sigma\|\|Q\|^\{2\}\\right\)\.

Our relaxed check follows the classical idea of finite\-state or regular approximation of CFG\(Pereira,[1991](https://arxiv.org/html/2606.00722#bib.bib24); Pereira and Wright,[1997](https://arxiv.org/html/2606.00722#bib.bib25); Nederhof,[2000](https://arxiv.org/html/2606.00722#bib.bib26); Mohri and Nederhof,[2000](https://arxiv.org/html/2606.00722#bib.bib27)\)\. We view the CFG as a recursive transition network, where each nonterminal corresponds to a subautomaton with an entry state and an exit state\. We then flatten this network into a finite automaton by replacing each recursive call to a nonterminal with a transition to that nonterminal’s entry state, and by connecting its exit state to the possible continuation states\. After flattening, the automaton no longer records which call site entered a nonterminal\. Consequently, the automaton may accept additional lexeme sequences, but every CFG derivation is still preserved as a valid path\. The resulting automaton is therefore a regular coverℱG\\mathcal\{F\}\_\{G\}satisfying

L\(G\)⊆L\(ℱG\)\.L\(G\)\\subseteq L\(\\mathcal\{F\}\_\{G\}\)\.

### A\.3Lexers and Parsers

Practical parsers rarely operate on raw character sequences directly\. A typical compiler front end first applies lexing, which maps a character\-level string into a sequence of tokens or lexemes, and then applies parsing to verify whether the token sequence conforms to a grammar\. Additional context\-sensitive checks, such as type checking or name resolution, are often performed after parsing\.

This separation is important for constrained decoding since language models generate tokens at the token level, whereas CFGs are usually defined over lexemes\. Partial outputs with masked regions make lexing ambiguous, since a generated token may belong to a lexeme on its left, a lexeme on its right, or a lexeme spanning a masked region\.

Earley parsing is a general CFG parsing algorithm that combines top\-down prediction with bottom\-up completion\(Earley,[1970](https://arxiv.org/html/2606.00722#bib.bib14)\)\. Unlike purely bottom\-up methods such as CYK, Earley\-style parsing can use top\-down grammar predictions to restrict which partial derivations are considered during parsing\. Our DFA\-free validity checker adopts this principle for witness checking over graph\-structured partial outputs\.

## Appendix BAdditional Related Work

This section provides additional related work that complements Section[2](https://arxiv.org/html/2606.00722#S2)\. We discuss diffusion language models, constrained decoding for autoregressive language models, and constrained decoding methods for diffusion language models\.

### B\.1Diffusion Language Models

Diffusion language models \(DLMs\) provide an alternative to autoregressive text generation\(Austinet al\.,[2021](https://arxiv.org/html/2606.00722#bib.bib8); Louet al\.,[2024](https://arxiv.org/html/2606.00722#bib.bib9); Sahooet al\.,[2024](https://arxiv.org/html/2606.00722#bib.bib1); Nieet al\.,[2026](https://arxiv.org/html/2606.00722#bib.bib2)\)\. Instead of generating tokens from left to right, DLMs iteratively refine a partially observed sequence, often starting from a sequence of masked tokens\. At each denoising step, the model predicts tokens for one or more masked positions conditioned on the current partially filled sequence\.

Masked diffusion language models \(MDLMs\) instantiate this idea using a discrete masking process\(Sahooet al\.,[2024](https://arxiv.org/html/2606.00722#bib.bib1); Nieet al\.,[2026](https://arxiv.org/html/2606.00722#bib.bib2)\)\. During generation, a model repeatedly replaces masked positions with sampled tokens until the sequence is fully specified\. This generation process enables nonsequential and potentially parallel token prediction, since multiple masked positions can be updated within the same denoising step\. The degree of parallelism is typically controlled by the number of denoising steps or the number of tokens committed per step\. Fewer steps allow more tokens to be generated per forward pass, improving speed but often increasing the risk of inconsistent or low\-quality outputs\(Kimet al\.,[2026](https://arxiv.org/html/2606.00722#bib.bib22)\)\.

This parallel generation makes DLMs attractive for efficient decoding\. At the same time, it creates new challenges for structured generation\. Since multiple tokens may be sampled independently before being committed, their joint compatibility with syntactic constraints is not guaranteed\. This issue motivates constrained decoding methods that can preserve the parallelism of DLMs while enforcing formal language constraints\.

### B\.2Constrained Decoding for AR Models

Constrained decoding aims to enforce model outputs to conform to a predefined set of constraints\. In many applications, this set is specified by a formal language, such as a regular expression, a JSON schema, or a context\-free grammar\. Constrained decoding has been widely studied for autoregressive language models, where generation proceeds from left to right and the validity of each next token can be checked against the current prefix\(Scholaket al\.,[2021](https://arxiv.org/html/2606.00722#bib.bib6); Genget al\.,[2023](https://arxiv.org/html/2606.00722#bib.bib5); Parket al\.,[2025](https://arxiv.org/html/2606.00722#bib.bib7)\)\.

For regular constraints, valid next tokens can be computed using finite automata\(Kooet al\.,[2024](https://arxiv.org/html/2606.00722#bib.bib28); Willard and Louf,[2023](https://arxiv.org/html/2606.00722#bib.bib29)\)\. For context\-free constraints, the decoder must account for nested and recursive structures, which requires parsing\-based or grammar\-based methods\(Genget al\.,[2023](https://arxiv.org/html/2606.00722#bib.bib5); Beurer\-Kellneret al\.,[2024](https://arxiv.org/html/2606.00722#bib.bib31); Ugareet al\.,[2024](https://arxiv.org/html/2606.00722#bib.bib32)\)\. Existing approaches differ in whether they mask invalid tokens before sampling, reject invalid samples after sampling, or combine both strategies\(Willard and Louf,[2023](https://arxiv.org/html/2606.00722#bib.bib29); Beurer\-Kellneret al\.,[2024](https://arxiv.org/html/2606.00722#bib.bib31); Ugareet al\.,[2024](https://arxiv.org/html/2606.00722#bib.bib32)\)\. These methods are effective for autoregressive generation because the prefix grows monotonically from left to right\.

The DLM setting breaks this assumption\. A partial output may contain multiple masks, and updates can occur at arbitrary positions\. As a result, prefix\-based constrained decoding algorithms cannot be directly applied\. This motivates validity checking procedures that reason about whether a partially filled sequence can still be completed into a valid string\.

### B\.3Constrained Decoding for DLMs

Several recent works have begun to study constrained decoding for diffusion language models\.Sureshet al\.\([2025](https://arxiv.org/html/2606.00722#bib.bib3)\)proposed DINGO, a constrained inference method for DLMs under regular\-language constraints\. Their method showed that formal constraints can be integrated into non\-autoregressive diffusion decoding\. However, regular constraints cannot capture many structured output settings that require nested or recursive structure, such as nested JSON objects, arithmetic expressions, and programming\-language syntax\.

The closest prior work to ours isMündleret al\.\([2026](https://arxiv.org/html/2606.00722#bib.bib4)\), which introduced CFG\-constrained decoding for diffusion LLMs\. Their framework formulates validity checking as an infilling\-based completion and reduces it to checking the non\-emptiness of the intersection between a CFG and a regular language representing possible completions of the partial output\. This approach enables CFG constraints for diffusion language models and multi\-region infilling, but it requires repeated lexing, automaton construction, DFA minimization, and intersection\-based emptiness checking during decoding\. Their paper explicitly constructs an NFA for the partial output, converts it to a DFA, minimizes it, and then performs CFG intersection\-based checking\.

Algorithm 1EPIC Decoding under CFG Constraints1:Prompt

pp, diffusion LM

MM, CFG

GG, lexical map

Λ\\Lambda, answer length

LL, steps

TT
2:Output tokens

xx
3:Compile

Λ\\Lambdaand initialize the lexing memoization table\.

4:Construct a regular cover

ℱG\\mathcal\{F\}\_\{G\}by flattening the production rules of

GG\.

5:Initialize

xxwith

ppfollowed by

LLcopies of\[MASK\]\.

6:for alldiffusion blocksdo

7:for

t=1,…,Tt=1,\\ldots,Tdo

8:

⊳\\trianglerightParallel proposal from the diffusion model

9:Run

MMon

xxand collect high\-confidence candidates

CCfor masked positions\.

10:Compute the number of tokens to commit

ntn\_\{t\}for this denoising step\.

11:

⊳\\trianglerightSelect a relaxed\-compatible batch and verify it exactly before committing

12:

B←RelaxedSubsetSelect\(x,C,ℱG,G,Λ\)B\\leftarrow\\textsc\{RelaxedSubsetSelect\}\(x,C,\\mathcal\{F\}\_\{G\},G,\\Lambda\)\.

13:Commit all candidates in

BBto

xx\.

14:

⊳\\trianglerightFall back to sequential rejection sampling if verification fails or commits remain

15:for

j=1,…,nt−\|B\|j=1,\\ldots,n\_\{t\}\-\|B\|do

16:Select the next highest\-confidence candidate

ccnot already committed\.

17:while

DFAFreeValidate\(x⊕c,G,Λ\)\\textsc\{DFAFreeValidate\}\(x\\oplus c,G,\\Lambda\)is falsedo

18:Reject

cc, suppress its logit, and select the next candidate\.

19:endwhile

20:Commit the accepted candidate

ccto

xx\.

21:endfor

22:endfor

23:endfor

24:return

xx

## Appendix CDetailed Algorithms

We provide implementation\-level pseudocode for EPIC\. These algorithms clarify how EPIC preserves the completable\-output criterion of prior CFG\-constrained decoding while changing how this criterion is evaluated during diffusion decoding\. Algorithm[1](https://arxiv.org/html/2606.00722#alg1)presents the main decoding loop, including proposal generation, relaxed subset selection, exact verification, and token commitment\. Algorithm[2](https://arxiv.org/html/2606.00722#alg2)describes how lexical representations are stored and reused across partial outputs\. Algorithm[3](https://arxiv.org/html/2606.00722#alg3)specifies the Earley\-style graph parser used for DFA\-free validity checking\. Algorithm[4](https://arxiv.org/html/2606.00722#alg4)gives the compatible subset selection for multiple\-token proposals under the relaxed regular cover\.

### C\.1Overall Decoding Procedure

Algorithm[1](https://arxiv.org/html/2606.00722#alg1)summarizes the overall decoding procedure of EPIC\. At each denoising step, the model scores the currently masked positions and proposes candidate tokens for multiple positions\. The role of the constrained decoder is to decide which of these updates can be committed without losing the possibility of completing the sequence into a valid string inL\(G\)L\(G\)\. The main difference from the baseline lies in how this validity decision is organized\. The baseline effectively checks candidate tokens one at a time using exact CFG validation\. This sequentializes the decoding process and weakens the main advantage of diffusion language models\. In contrast, EPIC first callsRelaxedSubsetSelectto search for a compatible subset that may be committed together\. This subset selection step is intentionally conservative\. It may return a smaller or empty batch, but any nonempty batch must still pass exact verification before commitment\.

Algorithm 2Lexing Memoization for Partial Outputs1:Partial output

xx, lexical map

Λ\\Lambda, lexing cache

𝒞lex\\mathcal\{C\}\_\{\\mathrm\{lex\}\}
2:

ε\\varepsilon\-NFA

𝒜x\\mathcal\{A\}\_\{x\}representing possible lexeme sequences

3:Merge consecutive generated tokens in

xxinto text spans separated by masked regions\.

4:Initialize an empty

ε\\varepsilon\-NFA

𝒜x\\mathcal\{A\}\_\{x\}\.

5:for allfixed text spans

wwin orderdo

6:if

\(w,boundary\)∈𝒞lex\(w,\\mathrm\{boundary\}\)\\in\\mathcal\{C\}\_\{\\mathrm\{lex\}\}then

7:Retrieve the cached lexing result for

ww\.

8:else

9:Compute the lexing result for

wwusing

Λ\\Lambda\.

10:Store the computed result in

𝒞lex\\mathcal\{C\}\_\{\\mathrm\{lex\}\}\.

11:endif

12:Add the retrieved or computed lexing result to add span\-level transitions to

𝒜x\\mathcal\{A\}\_\{x\}\.

13:endfor

14:Complete the construction by adding transitions for masked gaps and lexemes spanning them\.

15:return

𝒜x\\mathcal\{A\}\_\{x\}

For candidates that are not committed through the batch filter, the algorithm falls back to the exact rejection loop\. This fallback is important for two reasons\. First, it preserves the behavior of rejection\-based constrained decoding when parallel commitment is not possible\. Second, it separates efficiency from correctness\. Relaxed compatible subset selection is used only to propose a subset that is likely to pass exact validation, thereby reducing the number of sequential checks\. The final accept\-or\-reject decision is still made byDFAFreeValidate\. Thus, every token committed by Algorithm[1](https://arxiv.org/html/2606.00722#alg1)is still justified by the completable\-output condition\.

### C\.2Lexing Memoization

Algorithm[2](https://arxiv.org/html/2606.00722#alg2)describes the lexing memoization procedure\. Since the CFG is defined over lexemes while the diffusion model operates over a tokenizer vocabulary, validity checking first requires anε\\varepsilon\-NFA representation of lexeme sequences compatible with the current masked string\. The automaton construction follows the partial\-output lexing procedure, which handles fixed text spans, masked gaps, and lexemes that may span masked gaps\.

EPIC applies memoization to the lexing of text spans\. For each span, the procedure first checks whether the same span has already been lexed under the same boundary condition\. If the result is cached, it is reused during the construction of𝒜x\\mathcal\{A\}\_\{x\}\. If the entry is missing, the span is lexed using the lexical mapΛ\\Lambda, and the result is inserted into𝒞lex\\mathcal\{C\}\_\{\\mathrm\{lex\}\}\.

The output of the procedure is anε\\varepsilon\-NFA𝒜x\\mathcal\{A\}\_\{x\}\. Paths in this automaton correspond to lexeme sequences that are compatible with the current partial output\. Lexing memoization does not approximate or change this represented language\. It only changes the construction procedure by avoiding repeated computation for local contexts that have already been processed\. Thus, lexing memoization reduces the cost of constructing𝒜x\\mathcal\{A\}\_\{x\}without changing the completable\-output check\.

Algorithm 3DFA\-Free Validation with Earley\-style Parsing1:Partial output

xx, CFG

G=\(V,Σ,P,S\)G=\(V,\\Sigma,P,S\), lexical map

Λ\\Lambda
2:Whether

xxcan be completed into a string in

L\(G\)L\(G\)
3:

𝒜x←MemoizedLexing\(x,Λ,𝒞lex\)\\mathcal\{A\}\_\{x\}\\leftarrow\\textsc\{MemoizedLexing\}\(x,\\Lambda,\\mathcal\{C\}\_\{\\mathrm\{lex\}\}\)\.

4:Initialize an worklist with Earley\-style items at the start nodes of

𝒜x\\mathcal\{A\}\_\{x\}\.

5:Initialize an empty visited set for parser items\.

6:whilethe worklist is not emptydo

7:Pop an item from the worklist\.

8:ifthe item has already been visitedthen

9:Continue to the next item\.

10:endif

11:Mark the item as visited\.

12:ifthe item expects a nonterminalthen

13:

⊳\\trianglerightPrediction

14:Add predicted items for productions of the expected nonterminal to the worklist\.

15:elseifthe item expects a terminalthen

16:

⊳\\trianglerightScanning over graph edges

17:Follow matching terminal\-labeled edges in

𝒜x\\mathcal\{A\}\_\{x\}and add advanced items to the worklist\.

18:else

19:

⊳\\trianglerightCompletion

20:Advance items that were waiting for the completed nonterminal at the same graph node\.

21:Add the advanced items to the worklist\.

22:endif

23:ifan accepting item for the start symbol reaches an accepting node of

𝒜x\\mathcal\{A\}\_\{x\}then

24:returntrue

25:endif

26:endwhile

27:returnfalse

### C\.3DFA\-Free Validation with Earley\-style Parsing

Algorithm[3](https://arxiv.org/html/2606.00722#alg3)describes the DFA\-free validity checker used by EPIC\. After lexing memoization constructs the graph𝒜x\\mathcal\{A\}\_\{x\}, the goal is to determine whether the graph contains at least one path whose terminal sequence belongs toL\(G\)L\(G\)\. Equivalently, the procedure checks whether the partial output still has a completion satisfying the target CFG\.

The baseline performs this check through an automata\-theoretic pipeline\. It first converts the graph representation of the partial output into a DFA, often applies minimization, and then checks the non\-emptiness of the intersection between the DFA language and the CFG language\. This approach is correct, but it introduces substantial repeated overhead\. In particular, determinization and minimization are performed many times during decoding\.

EPIC avoids repeated DFA construction by running the witness search directly on𝒜x\\mathcal\{A\}\_\{x\}\. The validator generalizes Earley\-style parsing from a single input string to anε\\varepsilon\-NFA whose paths represent possible lexeme sequences\. Each parser item records both a CFG progress state and graph\-node information, so that it represents a partial derivation over a path segment of𝒜x\\mathcal\{A\}\_\{x\}\. Prediction introduces productions that can satisfy the next expected nonterminal, scanning advances items along compatible terminal\-labeled edges, and completion connects a recognized constituent to items that were waiting for it at the corresponding graph node\. The algorithm accepts once it finds a completed start\-symbol item that connects a start node of𝒜x\\mathcal\{A\}\_\{x\}to an accepting node\.

This DFA\-free formulation preserves the exact witness\-search problem\. The algorithm does not ask whether one fixed string is generated byGG\. Instead, it asks whether any path in the graph is generated byGG\. This is precisely the same condition captured by the CFG and DFA intersection test\. This change should be understood as a practical optimization rather than an improvement in the worst\-case asymptotic complexity of validity checking\. It reduces overhead by searching for the witness directly on𝒜x\\mathcal\{A\}\_\{x\}and avoiding construction of an equivalent minimal DFA at every validation\.

Algorithm 4Relaxed Compatible Subset Selection for Parallel Commit1:Partial output

xx, candidates

CC, regular cover

ℱG\\mathcal\{F\}\_\{G\}, CFG

GG, lexical map

Λ\\Lambda
2:Exact\-verified candidate subset

BBwith

\|B\|≥2\|B\|\\geq 2, or

∅\\emptyset
3:functionRelaxedSelect\(

x,Cx,C\)

4:if

\|C\|≤1\|C\|\\leq 1then

5:return

∅\\emptyset
6:endif

7:

xC←x⊕Cx\_\{C\}\\leftarrow x\\oplus C
8:

𝒜xC←MemoizedLexing\(xC,Λ,𝒞lex\)\\mathcal\{A\}\_\{x\_\{C\}\}\\leftarrow\\textsc\{MemoizedLexing\}\(x\_\{C\},\\Lambda,\\mathcal\{C\}\_\{\\mathrm\{lex\}\}\)
9:if

L\(ℱG\)∩L\(𝒜xC\)≠∅L\(\\mathcal\{F\}\_\{G\}\)\\cap L\(\\mathcal\{A\}\_\{x\_\{C\}\}\)\\neq\\emptysetthen

10:return

CC⊳\\trianglerightKeep the full subset as relaxed\-compatible

11:endif

12:Split

CCinto high\-confidence halves

CLC\_\{L\}and

CRC\_\{R\}\.

13:

BL←RelaxedSelect\(x,CL\)B\_\{L\}\\leftarrow\\textsc\{RelaxedSelect\}\(x,C\_\{L\}\)
14:

BR←RelaxedSelect\(x⊕BL,CR\)B\_\{R\}\\leftarrow\\textsc\{RelaxedSelect\}\(x\\oplus B\_\{L\},C\_\{R\}\)
15:return

BL∪BRB\_\{L\}\\cup B\_\{R\}
16:endfunction

17:functionExactShrink\(

x,Bx,B\)

18:if

\|B\|≤1\|B\|\\leq 1then

19:return

∅\\emptyset
20:endif

21:if

DFAFreeValidate\(x⊕B,G,Λ\)\\textsc\{DFAFreeValidate\}\(x\\oplus B,G,\\Lambda\)is truethen

22:return

BB
23:endif

24:Split

BBinto high\-confidence halves

BLB\_\{L\}and

BRB\_\{R\}\.

25:

BL′←ExactShrink\(x,BL\)B^\{\\prime\}\_\{L\}\\leftarrow\\textsc\{ExactShrink\}\(x,B\_\{L\}\)
26:

BR′←RelaxedSelect\(x⊕BL′,BR\)B^\{\\prime\}\_\{R\}\\leftarrow\\textsc\{RelaxedSelect\}\(x\\oplus B^\{\\prime\}\_\{L\},B\_\{R\}\)
27:

BR′′←ExactShrink\(x⊕BL′,BR′\)B^\{\\prime\\prime\}\_\{R\}\\leftarrow\\textsc\{ExactShrink\}\(x\\oplus B^\{\\prime\}\_\{L\},B^\{\\prime\}\_\{R\}\)
28:return

BL′∪BR′′B^\{\\prime\}\_\{L\}\\cup B^\{\\prime\\prime\}\_\{R\}
29:endfunction

30:Sort

CCby descending confidence\.

31:

B←RelaxedSelect\(x,C\)B\\leftarrow\\textsc\{RelaxedSelect\}\(x,C\)
32:return

ExactShrink\(x,B\)\\textsc\{ExactShrink\}\(x,B\)

### C\.4Relaxed Compatible Subset Selection for Parallel Commit

Algorithm[4](https://arxiv.org/html/2606.00722#alg4)describes the candidate subset selection procedure used to recover parallel token commitment\. For a candidate setCC, we writex⊕Cx\\oplus Cfor the partial output obtained by applying all candidate updates inCCtoxx\. Checking and committing every proposed token sequentially weakens the parallelism of DLM decoding\. Although a joint CFG check over all candidates is possible, a failed check does not identify which candidates caused the failure\. Finding the largest compatible subset by exhaustive search would require a number of checks exponential in the number of proposed tokens\.

EPIC avoids this cost by selecting a subset under a cheaper relaxed condition before exact verification\. The relaxed condition is defined by a regular coverℱG\\mathcal\{F\}\_\{G\}of the CFG language\. The automatonℱG\\mathcal\{F\}\_\{G\}can be viewed as a finite\-state relaxation of a recursive transition network representation ofGG, It ignores the return sites of recursive calls, which may introduce additional paths but preserves every CFG derivation\. Therefore,L\(G\)⊆L\(ℱG\)L\(G\)\\subseteq L\(\\mathcal\{F\}\_\{G\}\)\. This inclusion makes rejection under the regular cover sound\. If a candidate batch is rejected byℱG\\mathcal\{F\}\_\{G\}, then applying the whole batch cannot yield a CFG\-completable partial output\. If a candidate batch is accepted byℱG\\mathcal\{F\}\_\{G\}, it may still be invalid under the original CFG, so exact verification is required\.

The recursive structure of Algorithm[4](https://arxiv.org/html/2606.00722#alg4)uses the regular cover as a cheap group test\. The algorithm first tests the whole candidate set under the relaxed condition\. When the whole set fails the cover check, it splits the candidates by confidence and searches for smaller high\-confidence subsets that pass the relaxed check\. This allows the decoder to find a jointly plausible subset without enumerating all subsets\. The procedure is biased toward high\-confidence candidates because those are more likely to be selected by the unconstrained diffusion decoder as well\. The final exact check is essential because the regular cover may introduce false positives\. Before a nonempty subset is committed, the algorithm verifies the partial output obtained after applying the subset using the exact CFG validity checker\. If a relaxed\-compatible subset fails exact checking, the algorithm recursively shrinks it using the exact CFG validator\. Thus, before any nonempty subset is committed, the partial output obtained after applying the subset has been verified by the exact CFG validity checker\. The selector returns only nontrivial batches\. Singleton candidates are left to the standard sequential rejection sampler, which verifies them with the exact CFG checker\. After committing the verified subset, the decoder uses the standard sequential rejection loop to make the remaining commits\. This fallback preserves correctness while allowing EPIC to recover parallel commitment whenever a sufficiently large compatible subset is found\.

## Appendix DCorrectness Analysis of Proposed Methods

This section analyzes why EPIC preserves the same completable\-output criterion as the baseline decoder\. The analysis consists of four parts\. First, lexing memoization reuses local lexing results without changing the partial\-output language\. Second, DFA\-free validation checks the same non\-emptiness condition as the CFL and the regular language intersection test, but searches directly over the partial\-output graph\. Third, the finite automaton used in relaxed subset selection is a regular cover of the CFG language\.

### D\.1Lexing Memoization

*Lexing memoization preserves the partial\-output language\.*For any partial outputxx, Algorithm[2](https://arxiv.org/html/2606.00722#alg2)constructs an NFA𝒜x\\mathcal\{A\}\_\{x\}that recognizes the same set of lexeme sequences as the non\-memoized lexing procedure\.

The reason is that memoization changes only how local lexing alternatives are obtained\. For each local context, the algorithm either retrieves a previously computed result from the cache or computes the result using the same lexical mapΛ\\Lambda\. Since cache entries are keyed by the corresponding local prefix and suffix configuration, a cache hit refers to the same lexing subproblem\. The final graph is assembled from the same local alternatives and the sameϵ\\epsilon\-connections across masked regions\. Thus, memoization reduces construction cost but does not approximate or change the represented lexeme sequence language\.

### D\.2DFA\-Free Validation

*DFA\-free validation checks exact CFG completability over the partial\-output graph\.*For a partial\-output graph𝒜x\\mathcal\{A\}\_\{x\}and CFGGG, Algorithm[3](https://arxiv.org/html/2606.00722#alg3)searches for a path in𝒜x\\mathcal\{A\}\_\{x\}whose label sequence belongs toL\(G\)L\(G\)\. Thus, it targets the same condition

L\(G\)∩L\(𝒜x\)≠∅\.L\(G\)\\cap L\(\\mathcal\{A\}\_\{x\}\)\\neq\\emptyset\.
The validator can be viewed as an Earley\-style recognizer generalized from string positions to graph nodes\. Prediction introduces productions that may satisfy the next expected nonterminal, scanning items along matching graph edges, and completion connects recognized constituents to items waiting for them\. An accepting item therefore corresponds to a CFG derivation along some path of𝒜x\\mathcal\{A\}\_\{x\}\. Conversely, any such derivation can be explored by the same prediction, scanning, and completion operations\. The algorithm therefore changes the search strategy, not the validity criterion\.

### D\.3Regular Cover

*The flattened automaton is a regular cover of the CFG language\.*LetℱG\\mathcal\{F\}\_\{G\}be the finite automaton obtained by flattening the recursive transition network induced by a CFGGG\. Then the automaton over\-approximates the CFG language:

L\(G\)⊆L\(ℱG\)\.L\(G\)\\subseteq L\(\\mathcal\{F\}\_\{G\}\)\.
This construction follows the standard RTN\-based regular superset approximation of CFGs\(Nederhof,[2000](https://arxiv.org/html/2606.00722#bib.bib26)\)\. A CFG can be viewed as a recursive transition network by introducing entry and exit states for each nonterminal and rule\-position states for each production\. Flattening joins these components into a single finite automaton and replaces the recursive call\-return mechanism with ordinary transitions\. This removes return\-site matching and can therefore introduce additional paths, but it does not remove any path corresponding to a valid CFG derivation\.

The inclusion follows by a structural induction over CFG derivation trees\. For each subtree rooted at a nonterminalAA, the corresponding RTN contains a path from the entry state ofAAto the exit state ofAAwhose labels are exactly the yield of that subtree\. For a productionA→X1⋯XmA\\to X\_\{1\}\\cdots X\_\{m\}, terminal symbols are matched by the corresponding labeled rule\-position transitions, and nonterminal children are matched by the induction hypothesis together with the flattened call and return transitions\. Since flattening keeps all such transitions and only forgets which call site a return must match, the yield of the whole derivation tree labels an accepting path inℱG\\mathcal\{F\}\_\{G\}\. Thus every string generated byGGis accepted byℱG\\mathcal\{F\}\_\{G\}\.

## Appendix EExperimental Details

We follow the benchmark suite, constraints, and prompting protocol of prior CFG\-constrained decoding work for DLMs\. All runs use a maximum generation length of256256tokens and temperature0\.00\.0with greedy decoding\. For the main comparison, we evaluate denoising steps in\{16,32,64,128,256\}\\\{16,32,64,128,256\\\}and repeat runtime measurements with seeds\{42,43,44\}\\\{42,43,44\\\}, although greedy decoding makes the generated outputs seed independent\. For the ablation study, we fix the denoising step count to3232and evaluate all subsets of the three acceleration components of EPIC\.

### E\.1Implementation and Decoding Settings

We implement EPIC on top of the public implementation of prior CFG\-constrained decoding for diffusion language models\(Mündleret al\.,[2026](https://arxiv.org/html/2606.00722#bib.bib4)\)\. The evaluation pipeline, dataset wrappers, model wrappers, and profiling scripts are written in Python, while the formal\-language backend uses Rust components for finite automata, CFG operations, regular\-expression compilation, automaton construction, and intersection\-related routines\.

We compare three decoding modes\. Unconstrained decoding disables grammar constraints\. The baseline enables CFG constraints but disables the three acceleration components\. EPIC enables lexing memoization, DFA\-free validation, and relaxed compatible subset selection\. For the main comparison, we evaluate all three modes across the three tasks, four models, five denoising\-step settings, and three runtime seeds\. Smaller step counts reveal more tokens per denoising step and therefore place more pressure on the sequential checking bottleneck\. The ablation study is performed at3232denoising steps and varies only which of the three acceleration components are enabled\. For each run, we record generated outputs, total wall\-clock time, and component\-level profiling statistics used in the runtime analysis\.

### E\.2Datasets, Prompts, and Metrics

We evaluate three structured\-generation tasks: C\+\+ code generation, JSON generation under instance\-specific schemas, and SMILES generation\. Each prompt follows the prior constrained\-diffusion implementation and contains a system instruction, a user input, and an assistant\-side target prefix\. The assistant\-side prefix places the model inside the target format before generation begins, so that the generated continuation aligns with the grammar used by the constrained decoder\.

For C\+\+, we use the C\+\+ split of HumanEval\-X\(Zhenget al\.,[2023](https://arxiv.org/html/2606.00722#bib.bib18)\)\. Each instance provides a programming problem, a target function declaration, and test cases\. We use the C\+\+ CFG and lexical specification from the prior implementation\. Syntactic correctness is measured by CFG\-based syntax checking, and functional correctness is measured by compiling the generated program and running the provided tests\.

For JSON, we use the JSON benchmark from the prior CFG\-constrained diffusion evaluation suite\(Mündleret al\.,[2026](https://arxiv.org/html/2606.00722#bib.bib4)\), which extends JSON\-Mode\-Eval\(NousResearch,[2024](https://arxiv.org/html/2606.00722#bib.bib19)\)\. Each instance contains a natural\-language input and an instance\-specific JSON schema\. The schema is converted into an instance\-specific CFG and lexical map using the prior implementation\. Syntactic correctness is measured by schema\-CFG acceptance, and functional correctness is measured by comparing the normalized generated JSON object with the reference output\.

For SMILES, we use the benchmark introduced by prior CFG\-constrained diffusion work\(Mündleret al\.,[2026](https://arxiv.org/html/2606.00722#bib.bib4)\)\. Each instance asks the model to generate a SMILES string from a molecule description\. We use the SMILES CFG and lexical specification from the prior implementation\. Unlike C\+\+ and JSON, this setting does not allow arbitrary whitespace between lexed tokens\. Syntactic correctness is measured by validity under the SMILES grammar and lexer, and functional correctness is measured by molecular equivalence to the reference SMILES string after canonicalization\.

### E\.3Models

We evaluate four instruction\-tuned diffusion language models following prior CFG\-constrained decoding work: Dream\(Yeet al\.,[2025](https://arxiv.org/html/2606.00722#bib.bib33)\), DreamCoder\(Xieet al\.,[2025](https://arxiv.org/html/2606.00722#bib.bib34)\), LLaDA\(Nieet al\.,[2026](https://arxiv.org/html/2606.00722#bib.bib2)\), and DiffuCoder\(Gonget al\.,[2026](https://arxiv.org/html/2606.00722#bib.bib35)\)\. Dream and LLaDA are general\-purpose DLMs, while DreamCoder and DiffuCoder are code\-oriented DLMs\. The evaluated checkpoints are Dream111Dream\-org/Dream\-v0\-Instruct\-7B\., DreamCoder222Dream\-org/Dream\-Coder\-v0\-Instruct\-7B\., LLaDA333GSAI\-ML/LLaDA\-8B\-Instruct\., and DiffuCoder444apple/DiffuCoder\-7B\-Instruct\.\. All models are evaluated with their corresponding tokenizer and wrapper\. Dream, DreamCoder, and DiffuCoder use the Dream\-style generation interface\. LLaDA uses a separate wrapper because its chat template and generation interface differ\. These wrapper\-level differences affect prompt formatting but not the task instructions, constraints, or evaluation protocol\.

### E\.4Experimental Environments

All experiments were conducted on a workstation with an NVIDIA RTX A6000 GPU and an AMD Ryzen Threadripper 3960X CPU\. The software environment used Linux 8\.10, Python 3\.11\.15, Rust 1\.93\.1, and PyTorch 2\.12\.0 built against CUDA 13\.0\. For model and dataset handling, we usedtransformers4\.52\.2,datasets3\.6\.0, andaccelerate1\.13\.0\. Rust extension modules were built withmaturin1\.13\.3\. For task\-specific evaluation, we usedpartialsmiles2\.0 andrdkit2026\.3\.2\.

ModelStepsC\+\+JSONSMILESUncon\.Con\.\\cellcolorgray\!15EPICUncon\.Con\.\\cellcolorgray\!15EPICUncon\.Con\.\\cellcolorgray\!15EPICDream 7B16571656\\cellcolorgray\!15450549969\\cellcolorgray\!15601232237\\cellcolorgray\!1524032609900\\cellcolorgray\!157009721203\\cellcolorgray\!15983514516\\cellcolorgray\!155196410891245\\cellcolorgray\!15117718221971\\cellcolorgray\!15184710151022\\cellcolorgray\!15102812820442167\\cellcolorgray\!15209834763630\\cellcolorgray\!15350819982004\\cellcolorgray\!15200325639584042\\cellcolorgray\!15407168126975\\cellcolorgray\!15686740434001\\cellcolorgray\!154011DreamCoder 7B165941876\\cellcolorgray\!15609517921\\cellcolorgray\!15461240242\\cellcolorgray\!15247325861582\\cellcolorgray\!158248211089\\cellcolorgray\!15735519523\\cellcolorgray\!155266410681949\\cellcolorgray\!15136815301806\\cellcolorgray\!15148010121016\\cellcolorgray\!15101912820242703\\cellcolorgray\!15223628333023\\cellcolorgray\!15278220002005\\cellcolorgray\!15200525638064462\\cellcolorgray\!15405454795548\\cellcolorgray\!15541539833984\\cellcolorgray\!153984LLaDA 8B16520578\\cellcolorgray\!153869231137\\cellcolorgray\!15690361405\\cellcolorgray\!1534332828791\\cellcolorgray\!1553113881554\\cellcolorgray\!151076619636\\cellcolorgray\!155706413361246\\cellcolorgray\!15102623322441\\cellcolorgray\!15204211561141\\cellcolorgray\!15110412823822394\\cellcolorgray\!15222142334335\\cellcolorgray\!15403022382197\\cellcolorgray\!15214925643594459\\cellcolorgray\!15438980238030\\cellcolorgray\!15788243844337\\cellcolorgray\!154320DiffuCoder 7B166281114\\cellcolorgray\!156576051002\\cellcolorgray\!15657349362\\cellcolorgray\!15366327011278\\cellcolorgray\!1586210441314\\cellcolorgray\!151005562584\\cellcolorgray\!155766411371579\\cellcolorgray\!15116319022116\\cellcolorgray\!15185410101043\\cellcolorgray\!15102012818972298\\cellcolorgray\!15197336013687\\cellcolorgray\!15347419522109\\cellcolorgray\!15204225631083684\\cellcolorgray\!15335170466706\\cellcolorgray\!15655137363854\\cellcolorgray\!153814Table 5:Full inference time results\. We report wall\-clock inference time in seconds for unconstrained decoding \(Uncon\.\), the prior CFG\-constrained baseline \(Con\.\), and EPIC\. These raw timings correspond to the relative inference times reported in Table[1](https://arxiv.org/html/2606.00722#S5.T1)\. The results cover all models, tasks, and denoising\-step settings used in the main comparison\. The shaded column highlights EPIC\.

## Appendix FAdditional Experimental Results

This section provides additional results for inference time, correctness, and ablation analyses\. We first report the full wall\-clock inference times across all models, datasets, and denoising\-step settings, complementing the relative inference\-time results in Table[1](https://arxiv.org/html/2606.00722#S5.T1)\. We then report syntactic and functional correctness, showing that EPIC maintains correctness comparable to the prior CFG\-constrained baseline while substantially reducing inference time\. Finally, we provide the full ablation results over all subsets of lexing memoization, DFA\-free validation, and relaxed compatible subset selection, complementing the main ablation table in Section[5\.4](https://arxiv.org/html/2606.00722#S5.SS4)\.

### F\.1Full results on Inference Time

Table[5](https://arxiv.org/html/2606.00722#A5.T5)provides the raw wall\-clock inference times corresponding to the relative inference times reported in Table[1](https://arxiv.org/html/2606.00722#S5.T1)\. The table includes unconstrained decoding, the prior CFG\-constrained baseline, and EPIC for every model, task, and denoising\-step setting\. These raw values are reported to make the normalization transparent and to show the absolute runtime scale across tasks\. As in the main results, all times are measured in seconds and averaged over three runtime seeds\.

The raw times also show how the absolute decoding cost scales with the number of denoising steps\. Although the increase is not exactly linear, runtime generally grows with the step count for all methods because each additional step requires another model forward pass\. This trend is visible even for unconstrained decoding, indicating that the overall runtime is still dominated by repeated model evaluation at larger step counts\. In this regime, EPIC stays close to unconstrained decoding, whereas the baseline constrained decoder often remains noticeably slower due to additional per\-step validity\-checking overhead\. Thus, the absolute timings complement the relative inference\-time results in the main text\. In many settings, EPIC brings CFG\-constrained decoding much closer to the runtime scale of unconstrained decoding\.

### F\.2Correctness

Tables[6](https://arxiv.org/html/2606.00722#A6.T6)and[7](https://arxiv.org/html/2606.00722#A6.T7)report the full syntactic and functional correctness results\. We use U\. for unconstrained decoding, C\. for the prior CFG\-constrained baseline, and E\. for EPIC\. The superscript−\-denotes the result before applying the final witness\-based recovery procedure, while the version without−\-includes this recovery\.

ModelStepsC\+\+JSONSMILESU\.C\.\-C\.E\.\-\\cellcolorgray\!15E\.U\.C\.\-C\.E\.\-\\cellcolorgray\!15E\.U\.C\.\-C\.E\.\-\\cellcolorgray\!15E\.Dream1635\.454\.3100\.053\.0\\cellcolorgray\!15100\.032\.450\.4100\.050\.0\\cellcolorgray\!15100\.061\.795\.2100\.095\.2\\cellcolorgray\!15100\.03267\.780\.5100\.080\.5\\cellcolorgray\!15100\.077\.687\.1100\.087\.1\\cellcolorgray\!15100\.070\.798\.2100\.098\.2\\cellcolorgray\!15100\.06480\.590\.299\.690\.9\\cellcolorgray\!1599\.692\.395\.6100\.096\.0\\cellcolorgray\!15100\.072\.599\.4100\.098\.8\\cellcolorgray\!15100\.012889\.095\.1100\.095\.1\\cellcolorgray\!1599\.890\.496\.3100\.096\.3\\cellcolorgray\!15100\.073\.1100\.0100\.0100\.0\\cellcolorgray\!15100\.025682\.994\.5100\.094\.5\\cellcolorgray\!15100\.091\.296\.1100\.096\.3\\cellcolorgray\!15100\.072\.5100\.0100\.0100\.0\\cellcolorgray\!15100\.0DreamCoder1617\.733\.599\.839\.0\\cellcolorgray\!1599\.879\.085\.3100\.085\.7\\cellcolorgray\!15100\.078\.497\.0100\.097\.0\\cellcolorgray\!15100\.03254\.366\.599\.465\.9\\cellcolorgray\!1599\.487\.591\.9100\.092\.3\\cellcolorgray\!15100\.079\.097\.0100\.097\.6\\cellcolorgray\!15100\.06472\.677\.499\.278\.7\\cellcolorgray\!15100\.090\.891\.9100\.092\.3\\cellcolorgray\!15100\.074\.998\.8100\.098\.8\\cellcolorgray\!15100\.012873\.880\.5100\.081\.1\\cellcolorgray\!15100\.093\.094\.1100\.094\.1\\cellcolorgray\!15100\.074\.398\.8100\.098\.8\\cellcolorgray\!15100\.025684\.185\.0100\.086\.0\\cellcolorgray\!15100\.094\.194\.9100\.094\.9\\cellcolorgray\!15100\.074\.398\.8100\.098\.8\\cellcolorgray\!15100\.0LLaDA165\.534\.199\.632\.9\\cellcolorgray\!15100\.057\.478\.3100\.078\.3\\cellcolorgray\!15100\.041\.390\.4100\.089\.2\\cellcolorgray\!15100\.0329\.131\.799\.231\.1\\cellcolorgray\!1599\.475\.489\.0100\.088\.6\\cellcolorgray\!15100\.060\.590\.4100\.089\.8\\cellcolorgray\!15100\.06442\.767\.199\.867\.1\\cellcolorgray\!15100\.079\.892\.3100\.092\.6\\cellcolorgray\!15100\.065\.994\.0100\.093\.4\\cellcolorgray\!15100\.012882\.992\.7100\.092\.7\\cellcolorgray\!15100\.087\.996\.3100\.096\.3\\cellcolorgray\!15100\.065\.991\.6100\.091\.6\\cellcolorgray\!15100\.025696\.399\.4100\.099\.4\\cellcolorgray\!15100\.091\.296\.3100\.096\.3\\cellcolorgray\!15100\.065\.395\.8100\.095\.6\\cellcolorgray\!15100\.0DiffuCoder1625\.040\.999\.839\.0\\cellcolorgray\!1599\.876\.883\.5100\.083\.5\\cellcolorgray\!15100\.070\.794\.0100\.093\.4\\cellcolorgray\!15100\.03236\.050\.099\.851\.8\\cellcolorgray\!1599\.887\.990\.8100\.090\.4\\cellcolorgray\!15100\.071\.998\.2100\.098\.2\\cellcolorgray\!15100\.06453\.767\.199\.867\.7\\cellcolorgray\!15100\.084\.686\.8100\.086\.8\\cellcolorgray\!15100\.062\.398\.8100\.098\.8\\cellcolorgray\!15100\.012860\.472\.6100\.074\.4\\cellcolorgray\!1599\.877\.979\.0100\.079\.0\\cellcolorgray\!15100\.049\.799\.2100\.099\.4\\cellcolorgray\!15100\.025693\.991\.3100\.093\.7\\cellcolorgray\!15100\.077\.979\.4100\.079\.4\\cellcolorgray\!15100\.047\.997\.0100\.097\.0\\cellcolorgray\!15100\.0Table 6:Syntactic correctness across all models, denoising steps, and tasks\. U\. denotes unconstrained decoding, C\. denotes the prior CFG\-constrained baseline, and E\. denotes EPIC\. The superscript−\-reports correctness before witness\-based recovery, while the version without−\-includes recovery\. All values are percentages\.#### F\.2\.1Syntactic Correctness

Table[6](https://arxiv.org/html/2606.00722#A6.T6)shows that both constrained methods substantially improve syntactic correctness over unconstrained decoding\. Even before recovery, both C\.\-and E\.\-consistently improve over unconstrained decoding, showing that rejection\-based CFG checking already prevents syntactic errors\. EPIC generally achieves pre\-recovery syntactic correctness comparable to the prior CFG\-constrained baseline\. The numbers are not always identical since EPIC may commit several tokens in parallel, and therefore can fill positions in a different order from the sequential baseline\. This changes the subsequent partial outputs observed by the model, which can lead to differences in either direction\. In some settings, EPIC is slightly higher than the baseline before recovery, while in others it is slightly lower\. These differences are expected because relaxed subset selection changes the commit order for efficiency, but every committed batch is still verified by the exact CFG checker\.

After applying the recovery procedure, both constrained methods achieve near\-perfect syntactic correctness in almost all settings\. This confirms that the efficiency improvements of EPIC do not weaken the final CFG\-based acceptance criterion\. Rather, EPIC mainly changes how compatible candidates are selected and verified\.

ModelStepsC\+\+JSONSMILESU\.C\.\-C\.E\.\-\\cellcolorgray\!15E\.U\.C\.\-C\.E\.\-\\cellcolorgray\!15E\.U\.C\.\-C\.E\.\-\\cellcolorgray\!15E\.Dream1610\.411\.611\.611\.6\\cellcolorgray\!1511\.613\.218\.023\.718\.0\\cellcolorgray\!1523\.70\.60\.60\.60\.6\\cellcolorgray\!150\.63215\.216\.516\.516\.5\\cellcolorgray\!1516\.543\.445\.246\.045\.2\\cellcolorgray\!1546\.03\.03\.03\.03\.0\\cellcolorgray\!153\.06418\.118\.518\.518\.9\\cellcolorgray\!1518\.953\.754\.854\.854\.8\\cellcolorgray\!1554\.83\.03\.63\.63\.6\\cellcolorgray\!153\.612813\.814\.614\.614\.4\\cellcolorgray\!1514\.451\.553\.353\.353\.3\\cellcolorgray\!1553\.33\.64\.24\.24\.2\\cellcolorgray\!154\.225612\.813\.413\.414\.0\\cellcolorgray\!1514\.051\.853\.753\.753\.7\\cellcolorgray\!1553\.73\.04\.24\.24\.2\\cellcolorgray\!154\.2DreamCoder166\.77\.98\.59\.1\\cellcolorgray\!159\.141\.943\.844\.143\.4\\cellcolorgray\!1544\.04\.24\.24\.24\.2\\cellcolorgray\!154\.23219\.520\.120\.720\.7\\cellcolorgray\!1521\.352\.252\.653\.352\.6\\cellcolorgray\!1553\.33\.64\.24\.24\.2\\cellcolorgray\!154\.26426\.826\.827\.226\.2\\cellcolorgray\!1526\.653\.753\.353\.353\.3\\cellcolorgray\!1553\.32\.42\.42\.42\.4\\cellcolorgray\!152\.412828\.028\.728\.728\.7\\cellcolorgray\!1528\.755\.956\.656\.656\.6\\cellcolorgray\!1556\.62\.42\.42\.42\.4\\cellcolorgray\!152\.425628\.027\.427\.427\.4\\cellcolorgray\!1527\.456\.356\.656\.656\.6\\cellcolorgray\!1556\.62\.42\.42\.42\.4\\cellcolorgray\!152\.4LLaDA160\.62\.84\.13\.3\\cellcolorgray\!154\.539\.043\.844\.143\.8\\cellcolorgray\!1544\.11\.21\.81\.81\.8\\cellcolorgray\!151\.8323\.05\.15\.35\.5\\cellcolorgray\!156\.344\.149\.650\.049\.6\\cellcolorgray\!1550\.00\.61\.21\.21\.2\\cellcolorgray\!151\.26410\.411\.011\.011\.0\\cellcolorgray\!1511\.044\.549\.650\.049\.6\\cellcolorgray\!1550\.01\.21\.81\.81\.8\\cellcolorgray\!151\.812815\.915\.915\.915\.9\\cellcolorgray\!1515\.948\.252\.952\.952\.9\\cellcolorgray\!1552\.91\.81\.81\.81\.8\\cellcolorgray\!151\.825625\.625\.025\.025\.0\\cellcolorgray\!1525\.049\.652\.252\.952\.2\\cellcolorgray\!1552\.92\.42\.42\.42\.4\\cellcolorgray\!152\.4DiffuCoder1612\.814\.014\.014\.6\\cellcolorgray\!1514\.642\.343\.844\.143\.8\\cellcolorgray\!1544\.10\.00\.60\.60\.6\\cellcolorgray\!150\.63215\.218\.318\.317\.7\\cellcolorgray\!1517\.747\.848\.949\.648\.5\\cellcolorgray\!1549\.30\.00\.00\.00\.0\\cellcolorgray\!150\.06417\.718\.318\.318\.3\\cellcolorgray\!1518\.345\.247\.147\.447\.1\\cellcolorgray\!1547\.40\.60\.60\.60\.6\\cellcolorgray\!150\.612825\.627\.429\.927\.4\\cellcolorgray\!1529\.742\.343\.443\.843\.4\\cellcolorgray\!1543\.81\.81\.81\.81\.8\\cellcolorgray\!151\.825643\.944\.345\.544\.5\\cellcolorgray\!1545\.741\.943\.443\.443\.4\\cellcolorgray\!1543\.42\.42\.42\.42\.4\\cellcolorgray\!152\.4Table 7:Functional correctness across all models, denoising steps, and tasks\. For C\+\+, functional correctness is measured by passing the test cases\. For JSON and SMILES, it is measured by task\-specific normalized equivalence to the reference output\. U\. denotes unconstrained decoding, C\. denotes the prior CFG\-constrained baseline, and E\. denotes EPIC\. The superscript−\-reports results before witness\-based recovery, while the version without−\-includes recovery\. All values are percentages\.
#### F\.2\.2Functional Correctness

Table[7](https://arxiv.org/html/2606.00722#A6.T7)reports task\-level functional correctness\. Constrained decoding often improves functional correctness over unconstrained decoding, especially on C\+\+ and JSON, because syntactically invalid outputs cannot satisfy the downstream evaluator\. As the number of denoising steps increases, functional correctness also tends to improve in many settings\. This is consistent with the usual diffusion decoding trade\-off\. With more steps, the decoder unmasks fewer tokens at each step, making each update more conditioned on previously committed context and generally improving generation quality\.

However, the improvement in syntactic correctness does not always translate into a proportional improvement in functional correctness\. This behavior is also observed in prior work and reflects a common limitation of grammar\-only constrained decoding\. A grammar can enforce formal syntactic validity, but it does not verify task semantics\. The witness\-based recovery procedure has only a marginal effect on functional correctness in most settings\. Although recovery raises syntactic correctness to nearly 100%, the recovered completion is selected to satisfy the grammar rather than to optimize the task\-level objective\. Therefore, recovery is useful as a syntactic fallback, but it should not be expected to substantially improve semantic or functional quality\.

The gap between syntactic and functional correctness is particularly visible on SMILES\. Although the grammar can enforce the surface syntax of SMILES strings, it does not fully characterize task\-level molecular correctness or equivalence to the target molecule\. In addition, the underlying CFG is only an approximation to the full set of functionally valid molecular strings\. As a result, syntactic correctness can approach 100% while functional correctness remains low\.

### F\.3Full Ablation Results

Table[8](https://arxiv.org/html/2606.00722#A6.T8)reports the full ablation results at3232denoising steps\. The table includes all individual components, all pairwise combinations, and the full EPIC configuration\. Overall, the results show that the three components provide complementary sources of speedup\. Lexing memoization mainly reduces repeated lexical analysis, DFA\-free validation removes the exact\-checking dependence on repeated DFA construction, and relaxed compatible subset selection reduces the number of sequential exact checks by recovering parallel commitment\.

The effect of each component depends on the task and model\. On C\+\+, all three components are consistently useful because the grammar and lexical specification are more complex, making repeated lexing, automaton construction, and CFG validation expensive\. DFA\-free validation is particularly effective for DreamCoder and DiffuCoder on C\+\+, while relaxed subset selection gives large gains for LLaDA, whose block\-wise decoding schedule provides stable candidate batches\. On JSON, lexing memoization and relaxed subset selection are generally more effective than DFA\-free validation alone\. This is consistent with the fact that JSON validation is relatively cheap once the partial\-output representation has been constructed, so avoiding DFA construction alone does not always compensate for the overhead of graph parsing\. On SMILES, the differences are small because the baseline constraint overhead is already low\. In some cases, the added component can slightly increase runtime\.

Pairwise combinations usually improve over the corresponding single\-component variants, indicating that the optimizations address different parts of the pipeline\. For example, combining memoization with DFA\-free validation substantially reduces C\+\+ runtime, while combining memoization with relaxed subset selection is especially helpful on JSON\. The full configuration, which enables all three components, gives the best runtime in most settings and is never substantially worse than the best ablation\. These results support the design of EPIC as a combination of complementary optimizations rather than a speedup from a single dominant component\.

ModelMethodC\+\+JSONSMILESDreamCon\.90012035161843109554128631237540390011625451\+275810715401\+379510435452\+38441160545\\cellcolorgray\!151\+2\+3\\cellcolorgray\!15700\\cellcolorgray\!15983\\cellcolorgray\!15519DreamCoderCon\.1582108952311385939540212611109540313789865431\+210069075401\+310567745432\+31154974543\\cellcolorgray\!151\+2\+3\\cellcolorgray\!15824\\cellcolorgray\!15735\\cellcolorgray\!15523LLaDACon\.79115546361772145065527381619628360111796091\+270414426271\+357711296082\+35711176595\\cellcolorgray\!151\+2\+3\\cellcolorgray\!15531\\cellcolorgray\!151076\\cellcolorgray\!15570DiffuCoderCon\.127813145841115211725982108513075823111911956011\+291811295811\+394710545992\+39961174597\\cellcolorgray\!151\+2\+3\\cellcolorgray\!15862\\cellcolorgray\!151005\\cellcolorgray\!15576Table 8:Full ablation results for EPIC\. Starting from the prior CFG\-constrained decoder \(Con\.\), we separately add1lexing memoization,2DFA\-free validation with Earley\-style parsing, and3relaxed compatible subset selection for parallel commit\. EPIC enables all three components\.
EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models

Similar Articles

Prefilling-dLLM: Predictive Prefilling for Long-Context Inference in Diffusion Language Models

Factorization-Error-Free Discrete Diffusion Language Model via Speculative Decoding

Efficient Diffusion LLMs via Temporal-Spatial Parallel Decoding and Confidence Extrapolation

PSD: Pushing the Pareto Frontier of Diffusion LLMs via Parallel Speculative Decoding

Constrained Code Generation with Discrete Diffusion

Submit Feedback

Similar Articles

Prefilling-dLLM: Predictive Prefilling for Long-Context Inference in Diffusion Language Models
Factorization-Error-Free Discrete Diffusion Language Model via Speculative Decoding
Efficient Diffusion LLMs via Temporal-Spatial Parallel Decoding and Confidence Extrapolation
PSD: Pushing the Pareto Frontier of Diffusion LLMs via Parallel Speculative Decoding
Constrained Code Generation with Discrete Diffusion