LatticeBridge: Rare-Event Sequential Inference for Faithful Structured Sequence Synthesis

arXiv cs.CL 06/11/26, 04:00 AM Papers
structured-generation sequential-monte-carlo decoding faithfulness nlp constrained-generation
Summary
LatticeBridge proposes a twisted sequential Monte Carlo decoder for structured sequence generation that improves constraint satisfaction by treating the problem as rare-event inference, outperforming greedy and beam baselines on CommonGen, E2E NLG, and WikiBio.
arXiv:2606.11203v1 Announce Type: new Abstract: Structured sequence generation often requires a model to satisfy several input-derived constraints in a single output. Standard decoding methods may assign high probability to fluent continuations while placing low mass on continuations that realize all required anchors jointly. We study this regime as a rare-event sequential inference problem. LatticeBridge combines a compact prefix language model, instance-compiled surface automata, and a twisted sequential Monte Carlo (SMC) decoder with resampling, multilevel splitting, and a source-support proposal term derived from instance-provided phrases. The constraint representation is compiled from each input instance and does not rely on manually curated lexical classes. On 2,610 attainable validation tasks spanning CommonGen, E2E NLG, and WikiBio, the particle decoder improves exact anchor satisfaction and mean anchor coverage over greedy, beam-filtered, and best-of-k ancestral baselines under a shared proposal model. Since exact anchor satisfaction alone does not rule out unsupported attribute substitutions, the evaluation reports required-anchor coverage, source coverage, source-intrusion diagnostics, overlap, runtime, and particle statistics jointly. The benchmark characterizes the faithfulness-overlap-latency frontier under a fixed proposal model.
Original Article
View Cached Full Text
Cached at: 06/11/26, 01:35 PM
# Rare-Event Sequential Inference for Faithful Structured Sequence SynthesisCode and benchmark files: https://github.com/farukalpay/latticebridge.
Source: [https://arxiv.org/html/2606.11203](https://arxiv.org/html/2606.11203)
Bugra Kilictas Department of Computer Engineering, Bahcesehir University bugra\.kilictas@bahcesehir\.edu\.tr

\(April 2026\)

###### Abstract

Structured sequence generation often requires a model to satisfy several input\-derived constraints in a single output\. Standard decoding methods may assign high probability to fluent continuations while placing low mass on continuations that realize all required anchors jointly\. We study this regime as a rare\-event sequential inference problem\.*LatticeBridge*combines a compact prefix language model, instance\-compiled surface automata, and a twisted sequential Monte Carlo \(SMC\) decoder with resampling, multilevel splitting, and a source\-support proposal term derived from instance\-provided phrases\. The constraint representation is compiled from each input instance and does not rely on manually curated lexical classes\. On 2,610 attainable validation tasks spanning CommonGen, E2E NLG, and WikiBio, the particle decoder improves exact anchor satisfaction and mean anchor coverage over greedy, beam\-filtered, and best\-of\-kkancestral baselines under a shared proposal model\. Since exact anchor satisfaction alone does not rule out unsupported attribute substitutions, the evaluation reports required\-anchor coverage, source coverage, source\-intrusion diagnostics, overlap, runtime, and particle statistics jointly\. The benchmark characterizes the faithfulness–overlap–latency frontier under a fixed proposal model\.

## 1\. Introduction

Autoregressive models provide strong local continuation models, but structured generation often imposes conjunctive requirements that are not well represented by local likelihood alone\. In data\-to\-text generation, for example, an output may need to realize several concepts or attribute values while remaining readable\. A high\-likelihood continuation can satisfy only a subset of those anchors, whereas a hard\-constrained decoder can satisfy anchors while producing unstable search behavior\. We treat this mismatch as a distributional problem: the set of fully satisfying continuations may have low probability under the base model even when the individual anchors are familiar\.

Accordingly, we formulate structured decoding as inference over a sequence space with a rare accepting event\. Sequential Monte Carlo methods are appropriate for this formulation because they maintain a weighted population of partial hypotheses and provide explicit diagnostics for weight degeneracy and resampling\. Recent work has connected twisted SMC to inference in language models\(Zhao et al\.,[2024](https://arxiv.org/html/2606.11203#bib.bib27); Wu et al\.,[2024](https://arxiv.org/html/2606.11203#bib.bib26)\); related 2025–2026 work further studies learned twists and SMC\-style decoding for constrained generation and model aggregation\(Kim et al\.,[2025](https://arxiv.org/html/2606.11203#bib.bib12); Chan et al\.,[2026](https://arxiv.org/html/2606.11203#bib.bib3)\)\. The present study isolates a compact conditional proposal model, surface automata compiled from each input instance, and a particle decoder with directly observable control signals\.

The implementation separates data adaptation, constraint compilation, proposal modeling, and evaluation\. This separation prevents benchmark\-specific lexical rules from entering the decoding algorithm unnoticed\. We use the following design constraints:

- •constraints must be*instance driven*, extracted from the source example rather than from manually curated topical dictionaries,
- •the library layer must stay reusable across tasks, while dataset adapters remain thin and explicit,
- •exceptional cases must be represented in logged metrics rather than hidden inside unreported lexical exceptions\.

LatticeBridge serializes each structured input into a prefix, trains a compact prefix language model, compiles anchor phrases into surface automata, and runs a twisted SMC bridge whose incremental reward depends on progress toward an accepting automaton state\. The decoder also uses a lightweight source\-support proposal factor derived from source phrases, which improves exact value realization without introducing dataset\-specific control rules\. The experiments evaluate coverage, source coverage, overlap, runtime, effective sample size, and acceptance mass under a shared proposal model, so that the inference layer can be inspected independently of model scale\.

## 2\. Contributions

The paper makes five concrete contributions\.

1. 1\.We formulate structured sequence synthesis as a sequential rare\-event inference problem with a distance\-to\-acceptance potential defined by instance\-compiled surface automata\.
2. 2\.We introduce a compact prefix language model plus a support\-aware twisted SMC bridge that exposes resampling, effective sample size, and acceptance mass as first\-class diagnostics\.
3. 3\.We present a benchmark construction method for CommonGen, E2E NLG, and WikiBio that uses input\-derived phrases only and keeps schema handling in thin dataset adapters\.
4. 4\.We report multi\-dataset validation results against greedy, beam\-filtered, and best\-of\-kkancestral baselines, together with standard errors, runtime measurements, particle\-system diagnostics, source coverage, and source\-intrusion diagnostics for unsupported value substitutions\.
5. 5\.We release code and benchmark files at[https://github\.com/farukalpay/latticebridge](https://github.com/farukalpay/latticebridge), including the paper source, summary tables, generated figures, and diagnostics artefacts\.

## 3\. Problem Formulation

Letxxbe a structured input, such as a set of concepts or attribute\-value pairs, and lety1:Ty\_\{1:T\}be a target surface sequence\. A conditional language modelpθ\(y1:T∣x\)p\_\{\\theta\}\(y\_\{1:T\}\\mid x\)defines the base synthesis distribution\. We are given a set of input\-grounded anchors

𝒞\(x\)=\{c1,…,cM\},\\mathcal\{C\}\(x\)=\\\{c\_\{1\},\\dots,c\_\{M\}\\\},where eachcmc\_\{m\}is a surface phrase extracted from the source instance\. The target objective is not simply to maximize model likelihood but to synthesize a sequence that is both plausible underpθp\_\{\\theta\}and faithful to the anchors\.

We encode satisfaction through an acceptance indicator

A\(y1:T;𝒞\)=𝟏\{∀m,cmappears iny1:T\}\.A\(y\_\{1:T\};\\mathcal\{C\}\)=\\mathbf\{1\}\\\{\\forall m,\\;c\_\{m\}\\text\{ appears in \}y\_\{1:T\}\\\}\.The exact conditional target would then be

πT\(y1:T∣x,𝒞\)∝pθ\(y1:T∣x\)A\(y1:T;𝒞\),\\pi\_\{T\}\(y\_\{1:T\}\\mid x,\\mathcal\{C\}\)\\propto p\_\{\\theta\}\(y\_\{1:T\}\\mid x\)\\,A\(y\_\{1:T\};\\mathcal\{C\}\),but direct sampling from this distribution is typically intractable whenA=1A=1is rare\.

To obtain a sequential bridge, we introduce an automaton statests\_\{t\}after generating prefixy1:ty\_\{1:t\}\. The automaton tracks how much of each anchor phrase has been realized\. Denote byd\(st\)≥0d\(s\_\{t\}\)\\geq 0the remaining distance to full acceptance\. We then define a relaxed sequence target

γT\(y1:T\)=pθ\(y1:T∣x\)exp⁡\{λΦT\(y1:T\)\},ΦT\(y1:T\)=−d\(sT\),\\gamma\_\{T\}\(y\_\{1:T\}\)=p\_\{\\theta\}\(y\_\{1:T\}\\mid x\)\\exp\\\{\\lambda\\Phi\_\{T\}\(y\_\{1:T\}\)\\\},\\qquad\\Phi\_\{T\}\(y\_\{1:T\}\)=\-d\(s\_\{T\}\),whereλ\>0\\lambda\>0controls how strongly we bias toward exact acceptance\. Forλ→∞\\lambda\\rightarrow\\infty, the distribution concentrates on accepting trajectories when such trajectories exist\.

The computational question is how to approximate this bridge under a finite particle budget while keeping the constraint representation task independent\. We compile the surface constraints once, then run a twisted particle system that rewards progress toward acceptance at each generation step\.

## 4\. Constraint Compilation Without Hidden Heuristics

### 4\.1\. Instance\-derived anchors

Anchor extraction is schema specific only in the sense that each dataset exposes a different structured input\. CommonGen provides source concepts, E2E provides attribute values, and WikiBio provides titles plus infobox field values\. After extraction, the inference layer receives only surface phrases and automaton states\. No global topic lexicon, domain inventory, or manually curated attribute map enters the decoder\.

Let𝒫\(x\)\\mathcal\{P\}\(x\)denote the candidate phrase set exposed by the dataset adapter for source instancexx\. For benchmark construction, we first retain only phrases that are attested in at least one reference surface for the same instance\. This attainability criterion prevents impossible exact\-match labels in the validation split while leaving the constraint set instance specific\. The decoder itself does not use the references\.

Among the attested candidates, LatticeBridge selects up toKKanchors by empirical source information score rather than adapter order\. For an evaluation subset𝒟\\mathcal\{D\}, define the source\-side document frequency

df^\(c\)=∑x′∈𝒟𝟏\{c∈𝒫\(x′\)\},I\(c\)=log⁡\|𝒟\|\+1df^\(c\)\+1\.\\widehat\{\\mathrm\{df\}\}\(c\)=\\sum\_\{x^\{\\prime\}\\in\\mathcal\{D\}\}\\mathbf\{1\}\\\{c\\in\\mathcal\{P\}\(x^\{\\prime\}\)\\\},\\qquad I\(c\)=\\log\\frac\{\|\\mathcal\{D\}\|\+1\}\{\\widehat\{\\mathrm\{df\}\}\(c\)\+1\}\.Each example uses theKKhighest\-scoring attested phrases\. This ranking favors anchors that are empirically specific within the benchmark subset and removes the hidden dependence on schema field order\.

### 4\.2\. Surface automata

Tokenization granularity can make phrase tracking segmentation\-dependent\. A token\-level automaton may fail when the tokenizer packs useful substrings into a single token\. LatticeBridge therefore uses a surface automaton built over the emitted string fragments of tokenizer tokens\. For each phrasecmc\_\{m\}, we build a KMP\-style automaton with states0,…,\|cm\|0,\\dots,\\lvert c\_\{m\}\\rvertthat track the currently matched character prefix\. The product state of all such automata defines the global constraint lattice\. This design is complementary to recent automata\-based constrained decoding work that also treats tokenization mismatch as a first\-class systems problem\(Koo et al\.,[2024](https://arxiv.org/html/2606.11203#bib.bib13)\)\.

This representation yields three operational properties:

1. 1\.progress toward acceptance is explicit throughd\(st\)d\(s\_\{t\}\),
2. 2\.constraint logic stays reusable across datasets,
3. 3\.phrase tracking is no longer tied to a particular BPE segmentation accident\.

## 5\. Twisted Sequential Monte Carlo Bridge

### 5\.1\. Sequential target

Letyty\_\{t\}denote the token chosen at steptt, and let the base model define one\-step conditionalspθ\(yt∣y<t,x\)p\_\{\\theta\}\(y\_\{t\}\\mid y\_\{<t\},x\)\. Define a progress signal

Δt=d\(st−1\)−d\(st\),\\Delta\_\{t\}=d\(s\_\{t\-1\}\)\-d\(s\_\{t\}\),which is positive when the new token moves the prefix closer to acceptance\. The sequential potential used by the bridge is

Gt\(yt,st−1,st\)=exp⁡\{λΔt\}\.G\_\{t\}\(y\_\{t\},s\_\{t\-1\},s\_\{t\}\)=\\exp\\\{\\lambda\\Delta\_\{t\}\\\}\.The unnormalized path density becomes

γt\(y1:t\)=pθ\(y1:t∣x\)∏τ=1tGτ\.\\gamma\_\{t\}\(y\_\{1:t\}\)=p\_\{\\theta\}\(y\_\{1:t\}\\mid x\)\\prod\_\{\\tau=1\}^\{t\}G\_\{\\tau\}\.

### 5\.2\. Twisted proposal

Sampling directly from the base model and reweighting byGtG\_\{t\}leads to rapid particle collapse\. We instead use a twisted proposal

qt\(yt∣y<t,x,st−1\)∝pθ\(yt∣y<t,x\)exp⁡\{τΔt\+βψx\(yt\)\},q\_\{t\}\(y\_\{t\}\\mid y\_\{<t\},x,s\_\{t\-1\}\)\\propto p\_\{\\theta\}\(y\_\{t\}\\mid y\_\{<t\},x\)\\exp\\\{\\tau\\Delta\_\{t\}\+\\beta\\psi\_\{x\}\(y\_\{t\}\)\\\},whereτ\\tauis the distance\-twist coefficient andψx\\psi\_\{x\}is a source\-support score\. Let𝒫\(x\)\\mathcal\{P\}\(x\)denote the full set of source phrases exposed by the dataset adapter, and letfirst\(c\)\\mathrm\{first\}\(c\)be the first tokenizer token of phrasecc\. We define a phrase\-initial empirical measure over the vocabulary

νx\(v\)=1\|𝒫\(x\)\|∑c∈𝒫\(x\)𝟏\{first\(c\)=v\},\\nu\_\{x\}\(v\)=\\frac\{1\}\{\|\\mathcal\{P\}\(x\)\|\}\\sum\_\{c\\in\\mathcal\{P\}\(x\)\}\\mathbf\{1\}\\\{\\mathrm\{first\}\(c\)=v\\\},then smooth it toward the uniform distribution and work with the log\-density ratio

ψx\(v\)=log⁡ν~x\(v\)−log⁡\|𝒱\|−1\.\\psi\_\{x\}\(v\)=\\log\\widetilde\{\\nu\}\_\{x\}\(v\)\-\\log\|\\mathcal\{V\}\|^\{\-1\}\.The factorψx\\psi\_\{x\}is inexpensive to compute and biases the proposal toward tokens that can start source\-supported phrases, which is particularly useful for exact values, names, and entity mentions\. The validation run setsτ=λ=2\.0\\tau=\\lambda=2\.0andβ=0\.4\\beta=0\.4\.

### 5\.3\. Importance weights and ESS

If particleiisamplesyt\(i\)∼qty\_\{t\}^\{\(i\)\}\\sim q\_\{t\}, its incremental weight update is

log⁡wt\(i\)=log⁡wt−1\(i\)\+log⁡pθ\(yt\(i\)∣y<t\(i\),x\)\+λΔt\(i\)−log⁡qt\(yt\(i\)∣y<t\(i\),x,st−1\(i\)\)\.\\log w\_\{t\}^\{\(i\)\}=\\log w\_\{t\-1\}^\{\(i\)\}\\;\+\\;\\log p\_\{\\theta\}\\\!\\left\(y\_\{t\}^\{\(i\)\}\\mid y\_\{<t\}^\{\(i\)\},x\\right\)\\;\+\\;\\lambda\\Delta\_\{t\}^\{\(i\)\}\\\!\-\\\!\\log q\_\{t\}\\\!\\left\(y\_\{t\}^\{\(i\)\}\\mid y\_\{<t\}^\{\(i\)\},x,s\_\{t\-1\}^\{\(i\)\}\\right\)\.Normalized weights are

w¯t\(i\)=wt\(i\)∑jwt\(j\),\\bar\{w\}\_\{t\}^\{\(i\)\}=\\frac\{w\_\{t\}^\{\(i\)\}\}\{\\sum\_\{j\}w\_\{t\}^\{\(j\)\}\},and the effective sample size is

ESSt=1∑i\(w¯t\(i\)\)2\.\\mathrm\{ESS\}\_\{t\}=\\frac\{1\}\{\\sum\_\{i\}\(\\bar\{w\}\_\{t\}^\{\(i\)\}\)^\{2\}\}\.We trigger resampling whenESSt<ρP\\mathrm\{ESS\}\_\{t\}<\\rho P, wherePPis the particle count andρ\\rhois the ESS threshold\.

### 5\.4\. Multilevel splitting

In low\-acceptance regimes, even twisted proposals can fail to maintain enough mass near acceptance\. We use periodic multilevel splitting to reallocate finite particles toward partial trajectories with lower remaining automaton distance\. At fixed intervals, particles are ranked by

Rt\(i\)=log⁡wt\(i\)−λd\(st\(i\)\),R\_\{t\}^\{\(i\)\}=\\log w\_\{t\}^\{\(i\)\}\-\\lambda d\(s\_\{t\}^\{\(i\)\}\),and a top\-fraction elite set is replicated to refill the population\. This step is related to adaptive multilevel splitting\(Cerou and Guyader,[2007](https://arxiv.org/html/2606.11203#bib.bib2)\)\. Because it changes the estimator under finite compute, we regard it as a search\-allocation mechanism rather than as an exact posterior sampler\.

Algorithm 1Twisted SMC bridge\.1:warm\-start hidden states with source prefix

2:initialize

PPparticles with automaton state

s0s\_\{0\}, hidden state

h0h\_\{0\}, and weight

w0=1w\_\{0\}=1
3:for

t=1t=1to

TTdo

4:forparticle

i=1,…,Pi=1,\\dots,Pdo

5:compute base logits from the prefix model

6:compute progress scores

Δt\(i\)\\Delta\_\{t\}^\{\(i\)\}from automaton transitions

7:sample

yt\(i\)y\_\{t\}^\{\(i\)\}from twisted proposal

qtq\_\{t\}
8:update particle weight using the importance ratio

9:update automaton state and hidden state

10:endfor

11:compute ESS and resample if needed

12:apply elite splitting at scheduled checkpoints

13:endfor

14:return best accepting sequence and summary diagnostics

## 6\. Prefix Language Model

The proposal model consists of an embedding layer, a stacked GRU, and an output projection\. This architecture provides a stable conditional proposal that can be trained quickly on local hardware and interrogated step by step during SMC\. It also keeps hidden\-state replication across particles inexpensive, which is important for local particle decoding\.

Each example is serialized as

⟨bos⟩⟨src⟩source serialization⟨tgt⟩target surface⟨eos⟩\.\\langle\\texttt\{bos\}\\rangle\\langle\\texttt\{src\}\\rangle\\text\{source serialization\}\\langle\\texttt\{tgt\}\\rangle\\text\{target surface\}\\langle\\texttt\{eos\}\\rangle\.The loss is a masked next\-token cross\-entropy in which the source prefix is excluded from supervision\. The model therefore learns to predict the target continuation conditioned on the source serialization without having to model the serialization itself as natural text\.

The recurrent proposal is used as a controlled conditional model for isolating the inference mechanism\. It keeps hidden\-state replication across particles inexpensive and avoids adding a second learned evaluator to the decoding loop\.

## 7\. Datasets and Benchmark Construction

### 7\.1\. Datasets

The reported structured\-generation benchmark uses three public datasets:

- •CommonGen\(Lin et al\.,[2020](https://arxiv.org/html/2606.11203#bib.bib16)\): concept\-to\-text generation,
- •E2E NLG\(Novikova et al\.,[2017](https://arxiv.org/html/2606.11203#bib.bib21); Dusek et al\.,[2019](https://arxiv.org/html/2606.11203#bib.bib8); Gehrmann et al\.,[2021](https://arxiv.org/html/2606.11203#bib.bib9)\): restaurant meaning representations to text,
- •WikiBio\(Lebret et al\.,[2016](https://arxiv.org/html/2606.11203#bib.bib14)\): biography infobox to text generation\.

Dataset staging uses public dataset archives together with Hugging Face\-hosted assets and tooling\(Lhoest et al\.,[2021](https://arxiv.org/html/2606.11203#bib.bib15)\)\. The quantitative claims in this paper use the three benchmark datasets listed in Table[1](https://arxiv.org/html/2606.11203#S7.T1)\.

Table 1:Datasets used in the reported validation benchmark\.
### 7\.2\. Benchmark protocol

For each validation example, we select up to three anchor phrases from the source that also appear in at least one reference surface\. Reference attestation removes impossible exact\-match labels from the benchmark while preserving an input\-grounded constraint set\. We then compare four inference methods:

1. 1\.greedy decoding,
2. 2\.beam filtering,
3. 3\.best\-of\-kkancestral sampling,
4. 4\.twisted SMC with resampling and splitting\.

The reported metrics are:

- •success rate: fraction of examples satisfying all anchors,
- •coverage: mean fraction of anchors present in the chosen output,
- •source coverage: mean fraction of source phrases present in the chosen output,
- •ROUGE\-L: a reference\-overlap proxy for surface quality,
- •token\-F1F\_\{1\}: additional overlap signal,
- •runtime: mean wall\-clock latency per example\.

The validation run covers 2,610 attainable constraint tasks: 993 from CommonGen, 996 from E2E NLG, and 621 from WikiBio\. Each summary reports the sample mean and standard error over examples\.

## 8\. Experimental Setup

All reported training and validation runs were executed on Apple Silicon using the Metal backend \(mps\)\. The hardware specification is reported only to contextualize wall\-clock measurements\. The main model was trained for several epochs with a batch size of 48 and a maximum sequence length of 160 tokens\. The validation benchmark used:

- •beam size 6,
- •best\-of\-16 ancestral sampling,
- •96 particles for twisted SMC,
- •64 maximum generated tokens,
- •λ=2\.0\\lambda=2\.0and twist scaleτ=2\.0\\tau=2\.0,
- •source\-support scaleβ=0\.4\\beta=0\.4,
- •ESS resampling threshold0\.5P0\.5P, splitting interval 12, and elite fraction 0\.2\.

## 9\. Training Dynamics and Operating Frontier

Table[2](https://arxiv.org/html/2606.11203#S9.T2)reports the optimization trace for the prefix model\. The loss trajectory is monotone over the reported epochs and supports the use of a fixed shared checkpoint for all decoding comparisons\.

Table 2:Epoch\-wise optimization trace for the compact prefix model\.Across datasets, the shared proposal model often realizes individual anchors without reliably realizing several anchors jointly under standard decoding\. The comparison isolates how the inference procedure changes the operating point of a fixed conditional model\.

Figure[1](https://arxiv.org/html/2606.11203#S9.F1)gives a budget\-oriented summary\. On CommonGen, twisted SMC obtains substantially higher coverage and exact success than the tested baselines while remaining below the best\-of\-16 ancestral runtime\. On E2E, twisted SMC improves exact success and coverage over greedy and beam filtering, and it obtains higher coverage than best\-of\-16 ancestral sampling at lower mean latency\. On WikiBio, exact acceptance remains rare for all methods, but twisted SMC still produces the widest coverage margin\. The figure is therefore read as a constrained\-inference operating curve rather than as a single\-metric ranking\.

![Refer to caption](https://arxiv.org/html/2606.11203v1/figures/coverage_runtime_frontier.png)Figure 1:Coverage\-runtime frontier for the validation benchmark\. The x\-axis is logarithmic to separate low\-latency baselines from particle\-based decoding regimes\.The frontier plot summarizes the comparison expanded in Table[3](https://arxiv.org/html/2606.11203#S10.T3): particle control shifts the coverage–runtime operating point, while overlap is measured separately to avoid conflating constraint satisfaction with surface similarity\.

## 10\. Results on Structured Generation

### 10\.1\. CommonGen

Table[3](https://arxiv.org/html/2606.11203#S10.T3)shows the main validation result\. On CommonGen, greedy, beam\-filtered, and best\-of\-16 ancestral decoding do not produce exact three\-anchor successes in the reported validation set\. Twisted SMC reaches0\.758±0\.0140\.758\\pm 0\.014exact success,0\.908±0\.0060\.908\\pm 0\.006required\-anchor coverage, and0\.772±0\.0070\.772\\pm 0\.007source coverage\.

### 10\.2\. E2E NLG

On E2E, the base model already copies a nontrivial fraction of attributes under greedy decoding\. Twisted SMC improves exact success to0\.473±0\.0160\.473\\pm 0\.016and required\-anchor coverage to0\.774±0\.0080\.774\\pm 0\.008, exceeding greedy, beam filtering, and best\-of\-16 ancestral sampling while maintaining lower mean latency than best\-of\-16 ancestral sampling\. Source coverage is lower than the best\-of\-16 ancestral baseline, but source intrusion is also lower \(0\.9260\.926versus1\.3621\.362\), which indicates that exact satisfaction, full source preservation, and unsupported\-value avoidance are related but distinct criteria\.

### 10\.3\. WikiBio

WikiBio introduces longer source serializations and more heterogeneous attribute values than the other two datasets\. In this regime, exact satisfaction is difficult for every decoder, but twisted SMC raises exact success to0\.023±0\.0060\.023\\pm 0\.006and required\-anchor coverage to0\.207±0\.0110\.207\\pm 0\.011, compared with near\-zero exact success for the baselines\.

Table 3:Validation benchmark over 2,610 attainable constraint tasks\. Parenthesized values are standard errors\. Bold marks the best success, required coverage, and source coverage within each dataset\.
### 10\.4\. Interpretation

The validation results support three observations:

1. 1\.the proposal model can realize many anchors but does not reliably combine them under standard decoding,
2. 2\.SMC reweighting, resampling, source\-support proposal shaping, and splitting increase exact satisfaction and mean required coverage under the same checkpoint across three different structured\-generation regimes,
3. 3\.the bridge changes the operating point rather than dominating every metric: required\-anchor coverage rises substantially, while source coverage, source intrusion, and ROUGE\-L depend on how much of the structured input the baseline already copies and how often it substitutes unsupported values\.

## 11\. Source\-Fidelity Diagnostics

Exact realization of the designated anchor set is a necessary condition for the constrained evaluation target, but it is not a sufficient statistic for instance\-level faithfulness\. The accepting event is defined over a selected subset of source phrases, whereas the structured input may contain additional values that can be contradicted or replaced by unsupported alternatives\. We report a complementary source\-coverage statistic over the full phrase inventory𝒫\(x\)\\mathcal\{P\}\(x\)exposed by the adapter:

SCov\(y;x\)=1\|𝒫\(x\)\|∑c∈𝒫\(x\)𝟏\{cappears iny\}\.\\mathrm\{SCov\}\(y;x\)=\\frac\{1\}\{\|\\mathcal\{P\}\(x\)\|\}\\sum\_\{c\\in\\mathcal\{P\}\(x\)\}\\mathbf\{1\}\\\{c\\text\{ appears in \}y\\\}\.This statistic is not used to define attainability, since many legitimate references verbalize only part of a source record\. It serves as an auxiliary diagnostic for value preservation under the same generated surface\. It is stricter than required\-anchor coverage in the sense that it evaluates the complete source phrase set rather than only the subset used to construct the rare accepting event\.

Source coverage is paired with a source\-intrusion statistic\. Let𝒰𝒟\\mathcal\{U\}\_\{\\mathcal\{D\}\}denote the content\-term vocabulary induced by candidate phrases in the evaluation subset, after removing terms whose document frequency exceeds a corpus\-level high\-frequency threshold, and let𝒰\(x\)\\mathcal\{U\}\(x\)denote the corresponding term set for the current source instance\. The intrusion count is

Intr\(y;x\)=∑u∈𝒰𝒟∖𝒰\(x\)𝟏\{uappears iny\}\.\\mathrm\{Intr\}\(y;x\)=\\sum\_\{u\\in\\mathcal\{U\}\_\{\\mathcal\{D\}\}\\setminus\\mathcal\{U\}\(x\)\}\\mathbf\{1\}\\\{u\\text\{ appears in \}y\\\}\.This quantity detects cases in which a decoder realizes the required anchors while importing lexical evidence supported by another source instance\. Among continuations with the same exact\-acceptance status, the implementation ranks required coverage first, source coverage second, source intrusion third, and model score plus reference overlap afterward\. This ordering preserves the constrained evaluation target while penalizing unsupported value substitutions\.

In the validation run, twisted SMC has lower mean source intrusion than the tested baselines on all three datasets:0\.2260\.226on CommonGen,0\.9260\.926on E2E NLG, and6\.3326\.332on WikiBio\. The E2E result is particularly informative because exact success and source coverage disagree: best\-of\-16 ancestral sampling has the highest source coverage, whereas twisted SMC has the highest exact success and the lowest unsupported\-value intrusion\.

## 12\. Qualitative Interpretation

The output examples in Appendix[D](https://arxiv.org/html/2606.11203#A4)illustrate the same control–quality relation measured in the aggregate metrics\. Incomplete continuations can remain competitive under local likelihood, while the twisted bridge reallocates probability mass toward accepting sequences\. The examples are reported with coverage, source coverage, source\-intrusion count, acceptance mass, and overlap so that exact acceptance is not interpreted as semantic adequacy by itself\.

## 13\. Connection to Doob Transforms and Dynamic Systems

Twisted proposals can be derived from the Doobhh\-transform associated with the probability of future acceptance\. For the sequence model considered here, the exact harmonic function is not available because it depends on the probability of reaching an accepting automaton state from every possible recurrent hidden state and partial surface\. LatticeBridge therefore uses distance reduction in the surface automaton together with a source\-support proposal factor as computable surrogates\.

From a dynamic\-systems perspective, the partial sequence and automaton state define a controlled nonlinear state process

zt=\(ht,st\),z\_\{t\}=\(h\_\{t\},s\_\{t\}\),wherehth\_\{t\}is the recurrent hidden state andsts\_\{t\}is the constraint state\. The twist then acts as an online control law over the proposal kernel:

qt\(⋅∣zt\)=𝒯τ,β\(pθ\(⋅∣zt\),d\(st\),ψx\)\.q\_\{t\}\(\\cdot\\mid z\_\{t\}\)=\\mathcal\{T\}\_\{\\tau,\\beta\}\\\!\\left\(p\_\{\\theta\}\(\\cdot\\mid z\_\{t\}\),d\(s\_\{t\}\),\\psi\_\{x\}\\right\)\.Under this interpretation, the reported diagnostics follow directly from the controlled particle dynamics\. Effective sample size measures degeneracy in the weighted particle approximation, acceptance mass estimates how much probability remains on accepting states, and runtime quantifies the hardware cost of applying the control law\. Accelerator\-oriented work on automata and trie vectorization\(Su et al\.,[2026](https://arxiv.org/html/2606.11203#bib.bib24)\)addresses the systems side of constrained decoding, while language\-model SMC work\(Zhao et al\.,[2024](https://arxiv.org/html/2606.11203#bib.bib27); Wu et al\.,[2024](https://arxiv.org/html/2606.11203#bib.bib26)\)provides the probabilistic background\.

## 14\. Approximate Doob Control Analysis

### 14\.1\. Exact Doob recursion

For a fixed horizonTT, the variance\-minimizing proposal for conditioning on acceptance is the Doob\-transformed kernel induced by the future acceptance probability\. Let

zt=\(ht,st\)z\_\{t\}=\(h\_\{t\},s\_\{t\}\)denote the recurrent hidden state and automaton state after steptt\. Define the harmonic bridge function

Ht\(zt\)=ℙpθ\(A\(y1:T;𝒞\)=1∣zt\),H\_\{t\}\(z\_\{t\}\)=\\mathbb\{P\}\_\{p\_\{\\theta\}\}\\\!\\left\(A\(y\_\{1:T\};\\mathcal\{C\}\)=1\\mid z\_\{t\}\\right\),with terminal conditionHT\(zT\)=A\(y1:T;𝒞\)H\_\{T\}\(z\_\{T\}\)=A\(y\_\{1:T\};\\mathcal\{C\}\)\. IfF\(zt,y\)F\(z\_\{t\},y\)denotes the next state after emitting tokenyy, thenHtH\_\{t\}obeys the backward recursion

Ht\(zt\)=∑ypθ\(y∣zt\)Ht\+1\(F\(zt,y\)\)\.H\_\{t\}\(z\_\{t\}\)=\\sum\_\{y\}p\_\{\\theta\}\(y\\mid z\_\{t\}\)\\,H\_\{t\+1\}\(F\(z\_\{t\},y\)\)\.The corresponding zero\-variance proposal is

qt⋆\(y∣zt\)=pθ\(y∣zt\)Ht\+1\(F\(zt,y\)\)Ht\(zt\)\.q\_\{t\}^\{\\star\}\(y\\mid z\_\{t\}\)=p\_\{\\theta\}\(y\\mid z\_\{t\}\)\\frac\{H\_\{t\+1\}\(F\(z\_\{t\},y\)\)\}\{H\_\{t\}\(z\_\{t\}\)\}\.Evaluating this proposal would require a backward recursion over the product of the language\-model state and the automaton state\. The automaton component is finite, but the recurrent hidden state is continuous, so the exact recursion is not tractable in the present implementation\.

### 14\.2\. Distance\-driven twist with source support

The implemented proposal replaces the unknown harmonic bridge with a surrogate defined by automaton distance and source support\. Writing the one\-step progress as

Δt\(y\)=d\(st\)−d\(Fs\(st,y\)\),\\Delta\_\{t\}\(y\)=d\(s\_\{t\}\)\-d\(F\_\{s\}\(s\_\{t\},y\)\),whereFsF\_\{s\}is the automaton transition, the implementation uses

qt\(y∣zt\)∝pθ\(y∣zt\)exp⁡\{τΔt\(y\)\+βψx\(y\)\}\.q\_\{t\}\(y\\mid z\_\{t\}\)\\propto p\_\{\\theta\}\(y\\mid z\_\{t\}\)\\exp\\\{\\tau\\Delta\_\{t\}\(y\)\+\\beta\\psi\_\{x\}\(y\)\\\}\.Let

Zt\(zt\)=∑ypθ\(y∣zt\)exp⁡\{τΔt\(y\)\+βψx\(y\)\}\.Z\_\{t\}\(z\_\{t\}\)=\\sum\_\{y\}p\_\{\\theta\}\(y\\mid z\_\{t\}\)\\exp\\\{\\tau\\Delta\_\{t\}\(y\)\+\\beta\\psi\_\{x\}\(y\)\\\}\.Because the sequence target itself uses the potentialexp⁡\{λΔt\}\\exp\\\{\\lambda\\Delta\_\{t\}\\\}, the incremental importance correction simplifies to

log⁡Gt\(y\)pθ\(y∣zt\)qt\(y∣zt\)=\(λ−τ\)Δt\(y\)−βψx\(y\)\+log⁡Zt\(zt\)\.\\log\\frac\{G\_\{t\}\(y\)\\,p\_\{\\theta\}\(y\\mid z\_\{t\}\)\}\{q\_\{t\}\(y\\mid z\_\{t\}\)\}=\(\\lambda\-\\tau\)\\Delta\_\{t\}\(y\)\-\\beta\\psi\_\{x\}\(y\)\+\\log Z\_\{t\}\(z\_\{t\}\)\.This identity separates two sources of variance:

1. 1\.the mismatch between the target bridge strengthλ\\lambdaand the proposal twist strengthτ\\tau,
2. 2\.the mismatch between local distance reduction plus source support and the true future acceptance value encoded inHtH\_\{t\}\.

In the validation benchmarkτ=λ\\tau=\\lambda, so the first term vanishes and the remaining correction reduces to the source\-support compensation plus the proposal normalizer\. The distance and support potentials are not estimates of the exact harmonic function; they are deterministic control signals over compiled source evidence\.

### 14\.3\. Control\-budget coupling

The coverage–latency relation can be formalized as a control problem with resource penalties\. One diagnostic objective for this relation is

𝒥=𝔼\[cov\(y1:T\)\]−ηlat𝔼\[time\(y1:T\)\]−ηdeg∑t=1T𝔼\[PESSt−1\]\+,\\mathcal\{J\}=\\mathbb\{E\}\\\!\\left\[\\mathrm\{cov\}\(y\_\{1:T\}\)\\right\]\-\\eta\_\{\\mathrm\{lat\}\}\\,\\mathbb\{E\}\\\!\\left\[\\mathrm\{time\}\(y\_\{1:T\}\)\\right\]\-\\eta\_\{\\mathrm\{deg\}\}\\sum\_\{t=1\}^\{T\}\\mathbb\{E\}\\\!\\left\[\\frac\{P\}\{\\mathrm\{ESS\}\_\{t\}\}\-1\\right\]\_\{\+\},wherecov\(y1:T\)\\mathrm\{cov\}\(y\_\{1:T\}\)is anchor coverage,PPis the particle count, and the last term penalizes degeneracy\. The implementation does not optimize this objective directly; it is used to interpret coverage, latency, and ESS under a common resource\-sensitive view\.

This view also distinguishes resampling from splitting\. Resampling redistributes probability mass when normalized weights degenerate\. Splitting reallocates finite compute toward high\-scoring branches when acceptance becomes too rare under the current proposal\. Both operations act on the same finite particle budget, but they address different numerical pathologies\.

## 15\. Related Work

### 15\.1\. Structured data\-to\-text generation

CommonGen\(Lin et al\.,[2020](https://arxiv.org/html/2606.11203#bib.bib16)\), E2E NLG\(Novikova et al\.,[2017](https://arxiv.org/html/2606.11203#bib.bib21); Dusek et al\.,[2019](https://arxiv.org/html/2606.11203#bib.bib8)\), and WikiBio\(Lebret et al\.,[2016](https://arxiv.org/html/2606.11203#bib.bib14)\)are canonical benchmarks for testing whether a system can convert structured inputs into faithful text\. Much work in this space focuses on architecture and pretraining rather than on the inference problem that appears after a model has already been trained\.

### 15\.2\. Constrained decoding

Lexically constrained beam search\(Hokamp and Liu,[2017](https://arxiv.org/html/2606.11203#bib.bib11); Post and Vilar,[2018](https://arxiv.org/html/2606.11203#bib.bib22)\), automata\-based constrained decoding\(Koo et al\.,[2024](https://arxiv.org/html/2606.11203#bib.bib13)\), lookahead\-based constrained decoding\(Lu et al\.,[2022](https://arxiv.org/html/2606.11203#bib.bib18)\), search\-oriented inference\-time scaling such as A\*\-decoding\(Chatziveroglou,[2025](https://arxiv.org/html/2606.11203#bib.bib4)\), draft\-conditioned or mixed natural/structured decoding\(Reddy et al\.,[2026](https://arxiv.org/html/2606.11203#bib.bib23); Nguyen et al\.,[2026](https://arxiv.org/html/2606.11203#bib.bib20)\), retrieval\-constrained systems such as trie\-based decoders\(Su et al\.,[2026](https://arxiv.org/html/2606.11203#bib.bib24)\), and recent MCMC\-based constrained sampling\(Anaya Gonzalez et al\.,[2025](https://arxiv.org/html/2606.11203#bib.bib1)\)all address subsets of the same issue: how to keep decoding inside a constrained region without destroying throughput\. Our work differs in using a particle system that explicitly represents multiple partial trajectories and their weights\.

### 15\.3\. Controlled language generation

Methods such as PPLM\(Dathathri et al\.,[2020](https://arxiv.org/html/2606.11203#bib.bib5)\)and DExperts\(Liu et al\.,[2021](https://arxiv.org/html/2606.11203#bib.bib17)\)alter decoding through gradients, auxiliary models, or expert/anti\-expert logits\. These methods are powerful but often depend on extra control models or attribute classifiers\. LatticeBridge instead relies on instance\-provided anchors and automaton geometry\.

### 15\.4\. Sequential Monte Carlo and twisting

The theoretical backbone comes from Feynman–Kac particle systems\(Del Moral,[2004](https://arxiv.org/html/2606.11203#bib.bib6)\), sequential Monte Carlo samplers\(Del Moral et al\.,[2006](https://arxiv.org/html/2606.11203#bib.bib7)\), and modern tutorials such asNaesseth et al\. \([2019](https://arxiv.org/html/2606.11203#bib.bib19)\)\. Twisted particle filters and related methods\(Whiteley et al\.,[2016](https://arxiv.org/html/2606.11203#bib.bib25); Guarniero et al\.,[2017](https://arxiv.org/html/2606.11203#bib.bib10)\)motivate our use of future\-oriented proposal shaping\. Recent sequence\-level extensions include twisted SMC for language\-model inference\(Zhao et al\.,[2024](https://arxiv.org/html/2606.11203#bib.bib27)\), reasoning\-oriented twisted sampling\(Wu et al\.,[2024](https://arxiv.org/html/2606.11203#bib.bib26)\), and 2026 work on language\-model ensembling with shared character\-space SMC\(Chan et al\.,[2026](https://arxiv.org/html/2606.11203#bib.bib3)\)\. Our implementation is closer in spirit to this language\-model inference line than to classical state\-space estimation, but the numerical concerns are the same\.

## 16\. Control Parameters and Validation Checks

The benchmark record stores the decoding controls used for every reported run\. The validation configuration fixes the split, task count, maximum generation length, beam size, ancestral sample count, particle count, bridge strength, twist strength, source\-support strength, ESS threshold, splitting interval, elite fraction, device, and random seed\.

The same checkpoint and tokenizer are used by all four decoders\. Candidate selection is performed after decoding by the ordered key

\(𝟏\{all anchors realized\},ReqCov,SCov,−Intr,log⁡p,ROUGE\-L\),\(\\mathbf\{1\}\\\{\\text\{all anchors realized\}\\\},\\ \\mathrm\{ReqCov\},\\ \\mathrm\{SCov\},\\ \-\\mathrm\{Intr\},\\ \\log p,\\ \\mathrm\{ROUGE\\mbox\{\-\}L\}\),so exact satisfaction is primary, partial required coverage and source coverage are secondary, source\-intrusion avoidance is tertiary, and model score plus reference overlap are used only to break remaining ties\. This policy is applied uniformly to beam filtering, ancestral sampling, and SMC outputs\.

For the SMC decoder, the validation run also records mean ESS, resampling count, acceptance mass, and realized generation length\. On CommonGen, the mean ESS is 57\.07 for 96 particles and the mean acceptance mass is 0\.782\. On E2E NLG, the mean ESS is 56\.25 and the acceptance mass is 0\.186\. On WikiBio, the mean ESS is 44\.57 and the acceptance mass is 0\.033, indicating the lowest accepting mass among the three reported datasets\. These diagnostics report particle degeneracy directly rather than inferring it from coverage alone\.

## 17\. Conclusion

LatticeBridge treats faithful structured sequence synthesis as sequential rare\-event inference\. The implementation combines a compact prefix model, surface automata, support\-aware twisted SMC, and multilevel splitting in a form that exposes the control mechanism through recorded particle statistics\. In the multi\-dataset validation benchmark, the bridge improves exact satisfaction and required\-anchor coverage relative to standard decoders while reporting the associated source\-coverage, source\-intrusion, overlap, and latency measurements\.

The system provides a reusable automaton interface, particle diagnostics, and benchmark files for studying the coverage–fidelity–runtime operating surface under a fixed proposal model\.

## References

- Anaya Gonzalez et al\. \(2025\)Emmanuel Anaya Gonzalez, Sairam Vaidya, Kanghee Park, Ruyi Ji, Taylor Berg\-Kirkpatrick, and Loris D’Antoni\.Constrained sampling for language models should be easy: An mcmc perspective\.*arXiv preprint arXiv:2506\.05754*, 2025\.
- Cerou and Guyader \(2007\)Frederic Cerou and Arnaud Guyader\.Adaptive multilevel splitting for rare event analysis\.*Stochastic Analysis and Applications*, 25\(2\):417–443, 2007\.
- Chan et al\. \(2026\)Robin Shing Moon Chan, Tianyu Liu, Samuel Kiegeland, Clemente Pasti, Jacob Hoover Vigly, Timothy J\. O’Donnell, Ryan Cotterell, and Tim Vieira\.Ensembling language models with sequential monte carlo\.*arXiv preprint arXiv:2603\.05432*, 2026\.
- Chatziveroglou \(2025\)Giannis Chatziveroglou\.A\*\-decoding: Token\-efficient inference scaling\.*arXiv preprint arXiv:2505\.13672*, 2025\.
- Dathathri et al\. \(2020\)Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu\.Plug and play language models: A simple approach to controlled text generation\.In*Proceedings of ICLR*, 2020\.
- Del Moral \(2004\)Pierre Del Moral\.*Feynman\-Kac Formulae: Genealogical and Interacting Particle Systems with Applications*\.Springer, 2004\.
- Del Moral et al\. \(2006\)Pierre Del Moral, Arnaud Doucet, and Ajay Jasra\.Sequential monte carlo samplers\.*Journal of the Royal Statistical Society: Series B*, 68\(3\):411–436, 2006\.
- Dusek et al\. \(2019\)Ondrej Dusek, David M\. Howcroft, and Verena Rieser\.Semantic noise matters for neural natural language generation\.In*Proceedings of the 12th International Conference on Natural Language Generation*, pages 421–426, 2019\.
- Gehrmann et al\. \(2021\)Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh Dhole, et al\.The gem benchmark: Natural language generation, its evaluation and metrics\.In*Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics*, pages 96–120, 2021\.
- Guarniero et al\. \(2017\)Peter Guarniero, Adam M\. Johansen, and Anthony Lee\.The iterated auxiliary particle filter\.*Journal of the American Statistical Association*, 112\(520\):1636–1647, 2017\.
- Hokamp and Liu \(2017\)Chris Hokamp and Qun Liu\.Lexically constrained decoding for sequence generation using grid beam search\.In*Proceedings of ACL*, pages 1535–1546, 2017\.
- Kim et al\. \(2025\)Sooyeon Kim, Giung Nam, Byoungwoo Park, and Juho Lee\.Improving constrained language generation via self\-distilled twisted sequential monte carlo\.*arXiv preprint arXiv:2507\.02315*, 2025\.
- Koo et al\. \(2024\)Terry Koo, Frederick Liu, and Luheng He\.Automata\-based constraints for language model decoding\.*arXiv preprint arXiv:2407\.08103*, 2024\.
- Lebret et al\. \(2016\)Remi Lebret, David Grangier, and Michael Auli\.Generating text from structured data with application to the biography domain\.*arXiv preprint arXiv:1603\.07771*, 2016\.
- Lhoest et al\. \(2021\)Quentin Lhoest, Albert Villanova del Moral, Yacine Jernite, Abhishek Thakur, Patrick von Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, et al\.Datasets: A community library for natural language processing\.In*Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations*, pages 175–184, 2021\.
- Lin et al\. \(2020\)Bill Yuchen Lin, Wangchunshu Zhou, Ming Shen, Pei Zhou, Chandra Bhagavatula, Yejin Choi, and Xiang Ren\.Commongen: A constrained text generation challenge for generative commonsense reasoning\.In*Findings of the Association for Computational Linguistics: EMNLP 2020*, pages 1823–1840, 2020\.
- Liu et al\. \(2021\)Alisa Liu, Maarten Sap, Ximing Lu, Swabha Swayamdipta, Chandra Bhagavatula, Noah A\. Smith, and Yejin Choi\.Dexperts: Decoding\-time controlled text generation with experts and anti\-experts\.In*Proceedings of ACL\-IJCNLP*, pages 6691–6706, 2021\.
- Lu et al\. \(2022\)Ximing Lu, Peter West, Rowan Zellers, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi\.Neurologic a\*esque decoding: Constrained text generation with lookahead heuristics\.In*Proceedings of NAACL\-HLT*, pages 780–799, 2022\.
- Naesseth et al\. \(2019\)Christian A\. Naesseth, Fredrik Lindsten, and Thomas B\. Schon\.Elements of sequential monte carlo\.*Foundations and Trends in Machine Learning*, 12\(3\):307–392, 2019\.
- Nguyen et al\. \(2026\)Ngoc Trinh Hung Nguyen, Alonso Silva, Laith Zumot, Liubov Tupikina, Armen Aghasaryan, and Mehwish Alam\.Thinking before constraining: A unified decoding framework for large language models\.*arXiv preprint arXiv:2601\.07525*, 2026\.
- Novikova et al\. \(2017\)Jekaterina Novikova, Ondrej Dusek, and Verena Rieser\.The e2e dataset: New challenges for end\-to\-end generation\.In*Proceedings of SIGDIAL*, pages 201–206, 2017\.
- Post and Vilar \(2018\)Matt Post and David Vilar\.Fast lexically constrained decoding with dynamic beam allocation for neural machine translation\.In*Proceedings of NAACL\-HLT*, pages 1314–1324, 2018\.
- Reddy et al\. \(2026\)Avinash Reddy, Thayne T\. Walker, James S\. Ide, and Amrit Singh Bedi\.Draft\-conditioned constrained decoding for structured generation in llms\.*arXiv preprint arXiv:2603\.03305*, 2026\.
- Su et al\. \(2026\)Zhengyang Su, Isay Katsman, Yueqi Wang, Ruining He, Lukasz Heldt, Raghunandan Keshavan, Shao\-Chuan Wang, Xinyang Yi, Mingyan Gao, Onkar Dalal, Lichan Hong, Ed Chi, and Ningren Han\.Vectorizing the trie: Efficient constrained decoding for llm\-based generative retrieval on accelerators\.*arXiv preprint arXiv:2602\.22647*, 2026\.
- Whiteley et al\. \(2016\)Nick Whiteley, Anthony Lee, and Karolina Heine\.Twisted particle filters\.*The Annals of Statistics*, 44\(2\):822–859, 2016\.
- Wu et al\. \(2024\)Xuansheng Wu, Wenlin Yao, Jianshu Chen, Xiaoman Pan, Xiaoyang Wang, Ninghao Liu, and Dong Yu\.Step\-by\-step reasoning for math problems via twisted sequential monte carlo\.*arXiv preprint arXiv:2410\.01920*, 2024\.
- Zhao et al\. \(2024\)Stephen Zhao, Rob Brekelmans, Alireza Makhzani, and Roger B\. Grosse\.Probabilistic inference in language models via twisted sequential monte carlo\.In*Proceedings of the 41st International Conference on Machine Learning*, pages 60704–60748, 2024\.

## Appendix AImplementation Details

### A\.1\. Tokenizer

We use a byte\-level BPE tokenizer with special tokens for source and target delimiters\. Byte\-level modeling keeps the surface automata general enough to handle punctuation and mixed\-case entity names without introducing language\-specific preprocessing\.

### A\.2\. Model

The current checkpoint uses a 256\-dimensional embedding, a 384\-dimensional GRU hidden state, two recurrent layers, and dropout of 0\.15\. The objective is masked next\-token prediction over the target portion only\.

### A\.3\. Inference

Greedy, beam\-filtered, best\-of\-kk, and twisted SMC all share the same prefix model\. The comparison therefore isolates the inference procedure rather than conflating decoding with model scale or pretraining\.

## Appendix BEvaluation Scope

The evaluation is defined with respect to exact realization of input\-derived anchors\. This choice yields an explicit accepting event together with a transition system over compiled surface automata\. The reported results should be interpreted as measurements of anchor\-faithful generation rather than as claims about broader semantic adequacy\.

The abstraction supplies an acceptance lattice that is shared across datasets while keeping schema\-specific processing outside the inference core\.

## Appendix CBudget\-Normalized Diagnostics

Table[4](https://arxiv.org/html/2606.11203#A3.T4)reports coverage and success lift relative to greedy decoding, normalized by additional runtime\. On CommonGen, twisted SMC is the only tested method that reaches exact success\. On E2E, best\-of\-16 ancestral sampling and twisted SMC both improve exact success, while twisted SMC gives the larger success lift at lower mean latency\. On WikiBio, exact gains remain small, but the coverage lift of twisted SMC is larger than that of the baselines\.

Table 4:Budget\-normalized lift relative to greedy decoding\. Positive values indicate improvement beyond the greedy baseline\.
## Appendix DExample Diagnostics

### CommonGen

CaseAnchorsBest baselineLatticeBridgeParticle stateexact lifthouse, table, sitbeam\_filter; req 0\.67; src 0\.67; intr 0; RL 0\.40req 1\.00; src 1\.00; intr 0; RL 0\.67mass 1\.00; ESS 54\.6exact liftsmoke, blow, sitancestral\_best\_of\_k; req 0\.33; src 0\.33; intr 0; RL 0\.09req 1\.00; src 1\.00; intr 0; RL 0\.29mass 1\.00; ESS 53\.6exact liftmirror, shave, faceancestral\_best\_of\_k; req 0\.33; src 0\.25; intr 0; RL 0\.22req 1\.00; src 0\.75; intr 0; RL 0\.40mass 1\.00; ESS 59\.7exact liftsnowball, snow, kidbeam\_filter; req 0\.00; src 0\.00; intr 1; RL 0\.23req 1\.00; src 0\.75; intr 0; RL 0\.22mass 1\.00; ESS 56\.7coverage liftpierce, ear, chairbeam\_filter; req 0\.00; src 0\.00; intr 1; RL 0\.32req 0\.67; src 0\.50; intr 0; RL 0\.35mass 0\.00; ESS 43\.9coverage liftsail, day, boatancestral\_best\_of\_k; req 0\.33; src 0\.33; intr 0; RL 0\.21req 0\.67; src 0\.67; intr 0; RL 0\.22mass 0\.00; ESS 48\.1coverage liftsleigh, pull, dogancestral\_best\_of\_k; req 0\.33; src 0\.33; intr 0; RL 0\.24req 0\.67; src 0\.67; intr 0; RL 0\.43mass 1\.00; ESS 54\.1coverage liftslide, hill, kidbeam\_filter; req 0\.00; src 0\.00; intr 1; RL 0\.36req 0\.67; src 0\.67; intr 0; RL 0\.50mass 0\.00; ESS 36\.8
### E2E NLG

CaseAnchorsBest baselineLatticeBridgeParticle statelow\-mass successCotto, moderate, The Portland Armsancestral\_best\_of\_k; req 1\.00; src 1\.00; intr 0; RL 0\.44req 1\.00; src 1\.00; intr 0; RL 0\.65mass 0\.00; ESS 49\.0exact lift5 out of 5, cheap, Fitzbilliesancestral\_best\_of\_k; req 0\.67; src 0\.50; intr 4; RL 0\.45req 1\.00; src 0\.83; intr 0; RL 0\.56mass 0\.00; ESS 68\.7exact liftAromi, 5 out of 5, Englishgreedy; req 0\.67; src 0\.50; intr 2; RL 0\.67req 1\.00; src 0\.83; intr 0; RL 0\.56mass 0\.00; ESS 62\.0exact liftAromi, restaurant, cheapbeam\_filter; req 0\.67; src 0\.75; intr 0; RL 0\.57req 1\.00; src 1\.00; intr 0; RL 0\.36mass 1\.00; ESS 84\.4exact liftAromi, cheap, coffee shopbeam\_filter; req 0\.67; src 0\.67; intr 1; RL 0\.40req 1\.00; src 1\.00; intr 0; RL 0\.40mass 0\.00; ESS 39\.1low\-mass successCocum, 1 out of 5, highbeam\_filter; req 1\.00; src 0\.83; intr 1; RL 0\.60req 1\.00; src 0\.83; intr 0; RL 0\.54mass 0\.00; ESS 73\.6low\-mass successCotto, The Portland Arms, cheapancestral\_best\_of\_k; req 1\.00; src 0\.80; intr 1; RL 0\.42req 1\.00; src 1\.00; intr 0; RL 0\.40mass 0\.00; ESS 44\.7low\-mass successAromi, average, coffee shopbeam\_filter; req 1\.00; src 1\.00; intr 0; RL 0\.53req 1\.00; src 1\.00; intr 0; RL 0\.27mass 0\.00; ESS 31\.3
### WikiBio

CaseAnchorsBest baselineLatticeBridgeParticle statecoverage lift1779, thomas edwards, welshbeam\_filter; req 0\.00; src 0\.00; intr 2; RL 0\.38req 0\.67; src 0\.25; intr 0; RL 0\.52mass 0\.00; ESS 50\.0coverage liftalberto volpi, roadbeam\_filter; req 0\.00; src 0\.00; intr 2; RL 0\.25req 0\.50; src 0\.10; intr 0; RL 0\.05mass 0\.00; ESS 25\.6coverage liftactivist, joseph defilippisancestral\_best\_of\_k; req 0\.00; src 0\.10; intr 2; RL 0\.27req 0\.50; src 0\.10; intr 0; RL 0\.04mass 0\.00; ESS 34\.3near missbaltimore orioles, pitcherancestral\_best\_of\_k; req 0\.50; src 0\.07; intr 1; RL 0\.33req 0\.50; src 0\.07; intr 0; RL 0\.15mass 0\.00; ESS 22\.9coverage liftmason cook darling, may 18 , 1801, wisconsinancestral\_best\_of\_k; req 0\.00; src 0\.00; intr 2; RL 0\.40req 0\.33; src 0\.08; intr 0; RL 0\.35mass 0\.00; ESS 46\.0near misstheodore von eltz, actorancestral\_best\_of\_k; req 0\.50; src 0\.10; intr 3; RL 0\.26req 0\.50; src 0\.10; intr 0; RL 0\.05mass 0\.00; ESS 50\.1near misschristine isobel mcgaffey frederick, americanancestral\_best\_of\_k; req 0\.50; src 0\.12; intr 3; RL 0\.19req 0\.50; src 0\.12; intr 0; RL 0\.02mass 0\.00; ESS 48\.7near missbarbara coombs lee, president of compassion & choices, americanancestral\_best\_of\_k; req 0\.33; src 0\.17; intr 2; RL 0\.04req 0\.33; src 0\.17; intr 0; RL 0\.01mass 0\.00; ESS 74\.0The examples report the same measurements used in Table[3](https://arxiv.org/html/2606.11203#S10.T3): exact satisfaction, source coverage, source intrusion, overlap, acceptance mass, and particle behavior are interpreted jointly rather than as substitutable quality scores\.
LatticeBridge: Rare-Event Sequential Inference for Faithful Structured Sequence Synthesis

Similar Articles

Faster LLM Inference via Sequential Monte Carlo

LACE: Lattice Attention for Cross-thread Exploration

SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting

Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

Sequential statistical inference for Large Language Models: Representation, validity, and monitoring

Submit Feedback

Similar Articles

Faster LLM Inference via Sequential Monte Carlo
LACE: Lattice Attention for Cross-thread Exploration
SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting
Thinking Before Constraining: A Unified Decoding Framework for Large Language Models
Sequential statistical inference for Large Language Models: Representation, validity, and monitoring