LoRe: Adaptive Interaction-Evaluation Routing with Per-Step Interaction Budgets for Iterative Graph Solvers

arXiv cs.LG Papers

Summary

Introduces LoRe, a training-free wrapper that enforces per-step interaction budgets for iterative graph solvers, achieving substantial speedups and memory reductions on combinatorial optimization problems like MIS and TSP.

arXiv:2605.29005v1 Announce Type: new Abstract: Diffusion-based neural solvers for combinatorial optimization repeatedly re-evaluate dense edge/factor interactions, making inference expensive in wall-clock time and often memory-bound at scale. Inspired by the computational methodologies of many-body physics, we introduce LoRe, a training-free, inference-time drop-in wrapper that enforces per-step interaction-evaluation budgeting: at each iteration, it evaluates only a fixed fraction of interactions by dynamically routing computation to high-conflict or high-uncertainty interactions, instead of using a fixed sparsification (e.g., static kNN graphs or static masks). Under fully inclusive end-to-end wall-clock accounting, LoRe substantially improves scalability on the Maximum Independent Set (MIS) problem, extending feasible inference more than $3\times$ beyond the baseline's out-of-memory limit, delivering a $\sim 8\times$ speedup and a $\sim 12\times$ peak-memory reduction, with solution quality preserved in this regime. Demonstrating cross-task generality on the large-scale Traveling Salesperson Problem (TSP) and zero-shot robustness to topology shifts, LoRe achieves a $\sim 15\times$ speedup at $n=1000$ with a $44\times$ memory reduction and competitive tour quality.
Original Article
View Cached Full Text

Cached at: 05/29/26, 09:14 AM

# LoRe: Adaptive Interaction-Evaluation Routing with Per-Step Interaction Budgets for Iterative Graph Solvers
Source: [https://arxiv.org/html/2605.29005](https://arxiv.org/html/2605.29005)
###### Abstract

Diffusion\-based neural solvers for combinatorial optimization repeatedly re\-evaluate dense edge/factor interactions, making inference expensive in wall\-clock time and often memory\-bound at scale\. Inspired by the computational methodologies of many\-body physics, we introduce LoRe, a training\-free, inference\-time drop\-in wrapper that enforces per\-step interaction\-evaluation budgeting: at each iteration, it evaluates only a fixed fraction of interactions by dynamically routing computation to high\-conflict or high\-uncertainty interactions, instead of using a fixed sparsification \(e\.g\., static kNN graphs or static masks\)\. Under fully inclusive end\-to\-end wall\-clock accounting, LoRe substantially improves scalability on the Maximum Independent Set \(MIS\) problem, extending feasible inference more than3×3\\timesbeyond the baseline’s out\-of\-memory limit, delivering a∼8×\{\\sim\}8\\timesspeedup and a∼12×\{\\sim\}12\\timespeak\-memory reduction, with solution quality preserved in this regime\. Demonstrating cross\-task generality on the large\-scale Traveling Salesperson Problem \(TSP\) and zero\-shot robustness to topology shifts, LoRe achieves a∼15×\{\\sim\}15\\timesspeedup atn=1000n=1000with a44×44\\timesmemory reduction and competitive tour quality\.

combinatorial optimization, diffusion\-based neural solvers, Conditional Computation / Dynamic Sparsity, runtime routing, per\-step interaction\-evaluation budgeting,

## 1Introduction

Many real\-world combinatorial decision systems invoke a solver as an inner\-loop primitive, re\-optimizing repeatedly as requests arrive and constraints evolve\. Examples include real\-time dispatch and routing in dynamic vehicle routing, datacenter or cluster scheduling under changing job arrivals, and network resource allocation reacting to time\-varying congestion\(Pillacet al\.,[2013](https://arxiv.org/html/2605.29005#bib.bib15); Vermaet al\.,[2015](https://arxiv.org/html/2605.29005#bib.bib16); Kellyet al\.,[1998](https://arxiv.org/html/2605.29005#bib.bib17)\)\. In these deployments, a feasible solution must be produced quickly and then improved under strict latency and memory budgets, so the binding constraint is often the per\-iteration compute/memory envelope rather than just the total number of refinement steps, a classic “anytime” requirement in deployed decision systems\(Zilberstein,[1996](https://arxiv.org/html/2605.29005#bib.bib18)\)\.

Iterative neural solvers on graphs, particularly diffusion\-based and GNN\-based models, have achieved strong performance on combinatorial optimization \(CO\) problems\(Bengioet al\.,[2021](https://arxiv.org/html/2605.29005#bib.bib19); Cappartet al\.,[2023](https://arxiv.org/html/2605.29005#bib.bib20); Mazyavkinaet al\.,[2021](https://arxiv.org/html/2605.29005#bib.bib21); Sun and Yang,[2023](https://arxiv.org/html/2605.29005#bib.bib1); Sanokowskiet al\.,[2024](https://arxiv.org/html/2605.29005#bib.bib5); Zhaoet al\.,[2025](https://arxiv.org/html/2605.29005#bib.bib7)\), including canonical tasks like Maximum Independent Set \(MIS\) and the Traveling Salesperson Problem \(TSP\)\. However, their scalability is bottlenecked by a recurring cost pattern: each refinement step requires a dense evaluation of the entire interaction topology \(e\.g\., all edges or factor pairs\) to resolve conflicts\(Scarselliet al\.,[2009](https://arxiv.org/html/2605.29005#bib.bib27); Gilmeret al\.,[2017](https://arxiv.org/html/2605.29005#bib.bib28); Hamiltonet al\.,[2017](https://arxiv.org/html/2605.29005#bib.bib29)\)\. With a fixed horizonTTand dense message passing, the computational cost scales asO​\(T​\|𝒜\|\)O\(T\|\\mathcal\{A\}\|\), and peak memory grows linearly with the interaction set size\|𝒜\|\|\\mathcal\{A\}\|\. On large graphs, this full\-support sweep often breaches the hardware envelope, leading to Out\-Of\-Memory \(OOM\) failures or unacceptable latency\.

This bottleneck presents a dilemma\. Reducing the number of stepsTT\(e\.g\., Distillation\) does not lower the per\-step peak memory\. On the other hand, static spatial sparsification \(e\.g\., fixed kNN graphs\) reduces the per\-step footprint but fails to capture the state\-dependent nature of combinatorial conflicts\. In iterative refinement, the “hotspots”, regions with high conflict or uncertainty, drift over time\. A fixed sparse support inevitably misses newly critical interactions, causing error accumulation and trajectory drift\.

We observe that this challenge conceptually mirrors the Many\-Body Problem in condensed matter physics\. In strongly correlated systems \(e\.g\., the Hubbard Model\), computing exact interactions across an infinite lattice is intractable\. Instead, methods like Cluster Dynamical Mean\-Field Theory \(C\-DMFT\) decompose the system into aCluster\(a local region where strong correlations are resolved exactly\) and aBath\(a background field that approximates the global environment\)\. Crucially, the coupling between the cluster and the bath ensures that local accuracy is maintained within a global context\.

Inspired by this physical insight, we proposeLoRe\(LocalRecompute\), a*training\-free*,*inference\-time*protocol that operationalizes this Cluster\-Bath decomposition for graph solvers\. LoRe enforces a hard per\-step operator budget by dynamically routing computation:

- •The Cluster:At each step, LoRe identifies a time\-varying subset of high\-conflict interactions \(MtM\_\{t\}\) to evaluate exactly\.
- •The Bath:The influence of the omitted interactions is approximated via a lightweight global recall signal, preventing the Cluster from becoming disconnected from the global state\.

Unlike static sparsification, LoRe tracks the “drifting hotspots” of the solver trajectory\. It functions as a drop\-in wrapper that transforms a standard dense solver into a budget\-aware dynamical system without retraining\. Figure[1](https://arxiv.org/html/2605.29005#S1.F1)summarizes the LoRe protocol\.

![Refer to caption](https://arxiv.org/html/2605.29005v1/x1.png)Figure 1:LoRe enforces*temporal operator sparsity*by routing per\-step interaction evaluations to the interaction subsetMtM\_\{t\}covered by time\-varying high\-conflict clusters under a hard budget\|Mt\|≤ρ​\|𝒜\|\|M\_\{t\}\|\\leq\\rho\|\\mathcal\{A\}\|, optionally stabilized by lightweight recall and projection/repair\.We evaluate LoRe as a training\-free, inference\-time drop\-in wrapper using fully inclusive end\-to\-end wall\-clock measurements under an explicit and auditable accounting protocol \(described in Section[3](https://arxiv.org/html/2605.29005#S3)\)\. Unless stated otherwise, all comparisons are against DIFUSCO\(Sun and Yang,[2023](https://arxiv.org/html/2605.29005#bib.bib1)\)using the same codebase and pretrained checkpoints; LoRe only changes inference\-time routing under the per\-step budget\.

The main contributions of this work are summarized as follows:

- •Conceptual Formulation:We formalize per\-step operator budgeting for iterative graph solvers, establishing a rigorous framework to constrain computational and memory envelopes within each refinement step\.
- •The LoRe Protocol:We introduce LoRe, a training\-free runtime wrapper that operationalizes a physics\-inspired Cluster\-Bath decomposition, inducing temporal operator sparsity via dynamic, state\-dependent routing\.
- •Auditable Accounting:We establish a fully inclusive, end\-to\-end accounting protocol, providing a transparent benchmark for resource\-constrained inference\.
- •Empirical Validation:We demonstrate that dynamic routing dominates strong static alternatives under matched budgets\. LoRe extends feasible inference3×3\\timesbeyond baseline out\-of\-memory limits on MIS, and delivers up to a15×15\\timesspeedup with a44×44\\timesmemory reduction on TSP\.

We conclude by outlining the paper organization: Section[2](https://arxiv.org/html/2605.29005#S2)reviews related work; Section[3](https://arxiv.org/html/2605.29005#S3)formalizes per\-step budgeting and the auditable accounting protocol, and then instantiates the budgeted interface withLoRe; Section[4](https://arxiv.org/html/2605.29005#S4)reports experiments; and Section[5](https://arxiv.org/html/2605.29005#S5)concludes\.

## 2Related Work

Neural solvers for CO increasingly treat inference as an iterative refinement process on graphs, spanning learned constructive policies and improvement heuristics as well as dynamical solvers that repeatedly re\-evaluate interactions across steps\(Bengioet al\.,[2021](https://arxiv.org/html/2605.29005#bib.bib19); Cappartet al\.,[2023](https://arxiv.org/html/2605.29005#bib.bib20); Mazyavkinaet al\.,[2021](https://arxiv.org/html/2605.29005#bib.bib21); Vinyalset al\.,[2015](https://arxiv.org/html/2605.29005#bib.bib22); Belloet al\.,[2017](https://arxiv.org/html/2605.29005#bib.bib23); Koolet al\.,[2019](https://arxiv.org/html/2605.29005#bib.bib24); Nazariet al\.,[2018](https://arxiv.org/html/2605.29005#bib.bib25); Khalilet al\.,[2017](https://arxiv.org/html/2605.29005#bib.bib26)\)\. A key practical consequence of this paradigm is that latency and memory can be dominated by repeated interaction evaluation rather than a single forward pass, particularly as graph size grows\. To make the main efficiency levers explicit, Table[1](https://arxiv.org/html/2605.29005#S2.T1)summarizes representative lines by their horizon, support pattern, and whether a hard per\-step budget is explicitly enforced\.

Diffusion\-style backbones exemplify this trade\-off\. In CO, diffusion\-based refinement has been instantiated in solvers such as DIFUSCO and DiffUCO, and further supported by scalable discrete diffusion samplers\(Sun and Yang,[2023](https://arxiv.org/html/2605.29005#bib.bib1); Sanokowskiet al\.,[2024](https://arxiv.org/html/2605.29005#bib.bib5),[2025](https://arxiv.org/html/2605.29005#bib.bib6)\)\. These methods can exhibit strong anytime behavior, but their step\-wise update often involves dense message passing over the interaction set, making step\-level footprint a binding constraint in large\-scale deployments\.

LoRe is designed to operate at this step level without retraining: it keeps the backbone parameters and refinement horizon fixed, and instead allocates expensive interaction evaluation under a hard per\-step budget through state\-dependent routing\. Closely related neural\-CO lines combine iterative refinement with alternative paradigms—adaptive solution expansion \(COExpander\(Maet al\.,[2025](https://arxiv.org/html/2605.29005#bib.bib4)\)\), structured denoising with step\-wise variable selection \(StruDiCO\(Wanget al\.,[2025](https://arxiv.org/html/2605.29005#bib.bib14)\)\), and generation\-as\-search test\-time scaling \(GenSCO\(Liet al\.,[2025](https://arxiv.org/html/2605.29005#bib.bib13)\)\)—which similarly re\-evaluate interactions across steps; LoRe’s step\-level budgeting is paradigm\-agnostic, and we demonstrate transfer to T2TCO and COExpander wrappers in Section[4](https://arxiv.org/html/2605.29005#S4)\.

A complementary efficiency thread focuses on reducing total sampling cost, including training–test alignment, step compression, and faster samplers or distillation for diffusion models\(Liet al\.,[2023](https://arxiv.org/html/2605.29005#bib.bib2),[2024](https://arxiv.org/html/2605.29005#bib.bib3); Hoet al\.,[2020](https://arxiv.org/html/2605.29005#bib.bib36); Songet al\.,[2021b](https://arxiv.org/html/2605.29005#bib.bib37),[a](https://arxiv.org/html/2605.29005#bib.bib38); Luet al\.,[2022](https://arxiv.org/html/2605.29005#bib.bib39); Salimans and Ho,[2022](https://arxiv.org/html/2605.29005#bib.bib40)\)\. These techniques primarily decrease the number \(or cost\) of refinement steps, and are therefore orthogonal to mechanisms that enforce a strict compute/memory envelope inside each step\. LoRe targets this within\-step constraint and can, in principle, be composed with step\-reduction or sampler\-acceleration methods\.

Another line reduces per\-step workload by restricting the interaction structure via static spatial sparsification, e\.g\., candidate graphs or fixed masks\. Classic TSP pipelines use candidate edges and Lin–Kernighan style neighborhoods\(Reinelt,[1991](https://arxiv.org/html/2605.29005#bib.bib43); Lin and Kernighan,[1973](https://arxiv.org/html/2605.29005#bib.bib41); Helsgaun,[2000](https://arxiv.org/html/2605.29005#bib.bib42)\), and analogous fixed supports also appear in learned solvers\. Static supports are effective when high\-impact interactions remain stable, but in iterative refinement the impact of interactions can shift along the trajectory; truncation error is therefore state\-dependent and can accumulate if critical interactions are systematically omitted\. This motivates allowing the expensive\-evaluation support to vary over steps under an explicit budget\.

A further related family of methods consists of outer\-loop improvement paradigms, including destroy–repair and large neighborhood search \(LNS\), where a neighborhood is selected and re\-optimized as a separate procedure\(Shaw,[1998](https://arxiv.org/html/2605.29005#bib.bib9); Ropke and Pisinger,[2006](https://arxiv.org/html/2605.29005#bib.bib44); Pisinger and Ropke,[2010](https://arxiv.org/html/2605.29005#bib.bib45)\)\. Recent work also explores neural or diffusion\-guided variants that leverage learned priors to propose neighborhoods or guide the improvement process\(Hottung and Tierney,[2020](https://arxiv.org/html/2605.29005#bib.bib46); Fenget al\.,[2024](https://arxiv.org/html/2605.29005#bib.bib8)\)\.

LoRe adopts a different integration point: rather than wrapping refinement with an outer loop, it budgets expensive operator evaluations*inside every refinement step*of the solver dynamics, allocating interaction evaluations under a hard per\-step envelope via state\-dependent routing\. Crucially, the budgeted quantity here is per\-step operator evaluation rather than neighborhood re\-optimization; as a result, LoRe does not introduce an additional outer solver or change the backbone parameters or the refinement horizon\.

There is a rich history of mapping CO problems to physical models, particularly Ising models and Hopfield networks, where energy minimization corresponds to optimization\(Hopfield and Tank,[1985](https://arxiv.org/html/2605.29005#bib.bib55); Lucas,[2014](https://arxiv.org/html/2605.29005#bib.bib56)\)\. Graph Neural Networks themselves have been analyzed through the lens of mean\-field inference in graphical models\(Daiet al\.,[2016](https://arxiv.org/html/2605.29005#bib.bib57)\)\.

However, most prior works focus on the mapping formulation \(e\.g\., constructing the Hamiltonian\)\. Our work draws inspiration from the computational methodology of condensed matter physics—specifically Cluster Dynamical Mean\-Field Theory \(C\-DMFT\)\(Maieret al\.,[2005](https://arxiv.org/html/2605.29005#bib.bib62)\)\. Instead of solving the full system or a purely local approximation, C\-DMFT couples an exact local cluster with a mean\-field bath\. LoRe operationalizes this “Cluster\-Bath” decomposition as a runtime routing protocol for neural solvers\.

Table 1:Solver\-oriented positioning by efficiency lever\. Representative references: DIFUSCO\(Sun and Yang,[2023](https://arxiv.org/html/2605.29005#bib.bib1)\), COExpander\(Maet al\.,[2025](https://arxiv.org/html/2605.29005#bib.bib4)\), T2TCO\(Liet al\.,[2023](https://arxiv.org/html/2605.29005#bib.bib2)\), Fast\-T2T\(Liet al\.,[2024](https://arxiv.org/html/2605.29005#bib.bib3)\)\.Work/lineHorizonSupportBudgetPermanentDIFUSCOfixedfullno–T2TCO / Fast\-T2Tfewerfullno–Static spatial sparsityfixedstaticimplicityesOuter\-loop refinement /LNSvarieslocalimplicitvariesCOExpandervariesvariesnot explicitvariesLoRe \(ours\)fixedtime\-varyingyesno
## 3LoRe: Budgeted Runtime Routing for Iterative Graph Solvers

This section details the inference\-time protocol\. We first formalize the iterative solver dynamics and the per\-step budget constraint\. We then introduce a conceptual analogy to the Cluster\-Bath decomposition in physical systems to motivate our operator design\. Finally, we present the LoRe algorithm, which realizes this decomposition via dynamic routing\.

### 3\.1Formulation: Iterative Refinement as Operator Evaluation

We view combinatorial optimization solvers as discrete dynamical systems\. Letxt∈ℝn×dx^\{t\}\\in\\mathbb\{R\}^\{n\\times d\}denote the latent state ofnnvariables at steptt\. The refinement dynamics \(e\.g\., diffusion denoising or GNN message passing\) are governed by an operator𝒯t\\mathcal\{T\}\_\{t\}acting over the interaction graph𝒢=\(𝒱,𝒜\)\\mathcal\{G\}=\(\\mathcal\{V\},\\mathcal\{A\}\):

xt\+1=Πt​\(𝒯t​\(xt;𝒜\)\),t=0,…,T−1\.x^\{t\+1\}\\;=\\;\\Pi\_\{t\}\\\!\\big\(\\mathcal\{T\}\_\{t\}\(x^\{t\};\\mathcal\{A\}\)\\big\),\\qquad t=0,\\ldots,T\-1\.\(1\)where𝒯t\\mathcal\{T\}\_\{t\}aggregates information primarily through the dense interaction set𝒜\\mathcal\{A\}\(e\.g\., edges in MIS or candidate moves in TSP\), andΠt\\Pi\_\{t\}represents low\-cost stabilization and feasibility handling \(e\.g\., bounding logits, projection, decoding/repair\)\. The computational bottleneck lies in𝒯t\\mathcal\{T\}\_\{t\}, which can be written in a full interaction form:

𝒯t​\(x;𝒜\)=ℬt​\(x\)\+∑a∈𝒜Δt,a​\(x\),\\mathcal\{T\}\_\{t\}\(x;\\mathcal\{A\}\)=\\mathcal\{B\}\_\{t\}\(x\)\+\\sum\_\{a\\in\\mathcal\{A\}\}\\Delta\_\{t,a\}\(x\),\(2\)whereℬt\\mathcal\{B\}\_\{t\}captures node\-wise constraints \(independent evolution\), whileΔt,a\\Delta\_\{t,a\}represents conflict interactions \(e\.g\., adjacent nodes in MIS\)\. In dense graphs, evaluating interactions over the full set𝒜\\mathcal\{A\}scales asO​\(\|𝒜\|\)O\(\|\\mathcal\{A\}\|\), which becomes prohibitive for large instances\. We enforce a per\-step operator budgetρ∈\(0,1\]\\rho\\in\(0,1\], constraining the solver to evaluate only a subsetMt⊆𝒜M\_\{t\}\\subseteq\\mathcal\{A\}with\|Mt\|≤ρ​\|𝒜\|\|M\_\{t\}\|\\leq\\rho\|\\mathcal\{A\}\|\.

### 3\.2Conceptual Intuition: The Cluster\-Bath Decomposition

To approximate the dense operator in Eq\. \([2](https://arxiv.org/html/2605.29005#S3.E2)\) under a strict per\-step budget without discarding global consistency, we draw a high\-level conceptual analogy to the computational methodologies developed for strongly correlated systems in condensed matter physics, specifically C\-DMFT\(Kotliaret al\.,[2001](https://arxiv.org/html/2605.29005#bib.bib60); Potthoffet al\.,[2003](https://arxiv.org/html/2605.29005#bib.bib61); Biroliet al\.,[2004](https://arxiv.org/html/2605.29005#bib.bib58); Maieret al\.,[2005](https://arxiv.org/html/2605.29005#bib.bib62)\)\.

In condensed matter physics, many\-body interactions arise when particles strongly influence one another across a lattice, making it computationally intractable to treat them independently or evaluate exact global forces\. Conceptually, CO exhibits a structurally identical bottleneck\. The variables in CO are strongly coupled by task\-specific constraints, such as adjacency exclusions in MIS or degree constraints in TSP\. These constraints act precisely like physical many\-body interactions \(Δt,a\\Delta\_\{t,a\}in Eq\. \([2](https://arxiv.org/html/2605.29005#S3.E2)\)\): they dynamically form highly correlated “hotspots” where conflicts must be resolved exactly, while the rest of the graph settles into a stable background state\.

C\-DMFT sidesteps this bottleneck by partitioning the system into aCluster\(where intense short\-range interactions are resolved exactly\) and a surroundingBath\(which approximates the long\-range global environment as a simplified background field\)\.

We explicitly emphasize that LoRe does not establish a formal mathematical equivalence to quantum physical models, nor does it simulate real electron dynamics\. Instead, we adapt C\-DMFT’s local\-exact/global\-approximate decomposition as an algorithmic blueprint\. This physical intuition motivates our runtime protocol to dynamically isolate high\-conflict interaction hotspots \(the Cluster\) from the stable background relations \(the Bath\), a structural pattern that LoRe operationalizes for graph solvers \(Fig\.[2](https://arxiv.org/html/2605.29005#S3.F2)\)\.

![Refer to caption](https://arxiv.org/html/2605.29005v1/x2.png)Figure 2:Inspired by C\-DMFT, LoRe partitions the interaction graph at each stepttinto two components\. \(Left\) A high\-conflict Cluster \(MtM\_\{t\}\) selected via dynamic routing, and a low\-conflict Bath \(𝒜∖Mt\\mathcal\{A\}\\setminus M\_\{t\}\)\. \(Right\) The influence of the omitted Bath is captured via a lightweight global signalgtg\_\{t\}and coupled to the Cluster through a global recall termℛt​\(x;gt\)\\mathcal\{R\}\_\{t\}\(x;g\_\{t\}\)\.ℬt​\(x\)\\mathcal\{B\}\_\{t\}\(x\)denotes node\-wise updates\.
### 3\.3LoRe: Budgeted Operator with Dynamic Routing

Inspired by theCluster\-Bathdecomposition, LoRe partitions the interaction graph𝒜\\mathcal\{A\}at each stepttinto a high\-conflict subsetMtM\_\{t\}and a low\-conflict complement𝒜∖Mt\\mathcal\{A\}\\setminus M\_\{t\}\. We replace the full operator in Eq\.[2](https://arxiv.org/html/2605.29005#S3.E2)with a budgeted operator:

𝒯~t​\(x;Mt,gt\)=ℬt​\(x\)\+∑a∈MtΔt,a​\(x\)\+ℛt​\(x;gt\),\\tilde\{\\mathcal\{T\}\}\_\{t\}\(x;M\_\{t\},g\_\{t\}\)=\\mathcal\{B\}\_\{t\}\(x\)\+\\sum\_\{a\\in M\_\{t\}\}\\Delta\_\{t,a\}\(x\)\+\\mathcal\{R\}\_\{t\}\(x;g\_\{t\}\),\(3\)where explicit, heavy computation is strictly restricted to the routed interaction subsetMtM\_\{t\}\. This operator design directly instantiates theCluster\-Bathblueprint through a standard graph\-neural lens:

- •ℬt​\(x\)\\mathcal\{B\}\_\{t\}\(x\)represents the independent node\-wise evolution\. It handles local state updates, conceptually mirroring isolated vertex dynamics rather than complex multi\-node interactions\.
- •∑a∈MtΔt,a​\(x\)\\sum\_\{a\\in M\_\{t\}\}\\Delta\_\{t,a\}\(x\)performs exact interaction evaluations exclusively over the dynamically selected high\-conflict hotspots\. This corresponds to theCluster, where severe structural conflicts must be resolved precisely\.
- •ℛt​\(x;gt\)\\mathcal\{R\}\_\{t\}\(x;g\_\{t\}\)introduces a lightweight global recall signal over the omitted interaction regions\(𝒜∖Mt\)\(\\mathcal\{A\}\\setminus M\_\{t\}\)\. Acting as amean\-field coupling, it approximates theBathto connect the localClusterto the global environment, ensuring the subgraph does not become structurally isolated\.

Since the critical conflicts and computational hotspots drift continuously during the iterative refinement process, a static edge selection strategy would lead to catastrophic truncation error accumulation\. LoRe mitigates this via a dynamic routing protocol, ensuring thatMtM\_\{t\}adaptively tracks state\-dependent importance across the solver trajectory\. The overall operational protocol consists of three components:

#### i\. Dynamic Routing \(Cluster Selection\):

We employ a proxy scorest,a=St​\(a;xt,xprev\)s\_\{t,a\}=S\_\{t\}\(a;x^\{t\},x\_\{\\text\{prev\}\}\)to identify critical interactions\. The active setMtM\_\{t\}combines a small static skeleton of structurally important interactions \(fixed acrosstt\) with a dynamic hotspot set selected by the proxy score, under the strict constraint\|Mt\|=B=⌊ρ​\|𝒜\|⌋\|M\_\{t\}\|=B=\\lfloor\\rho\|\\mathcal\{A\}\|\\rfloor\. The scoring function prioritizes interactions with \(a\) high endpoint uncertainty \(i\.e\., edges whose adjacency conflicts are not yet resolved\) and \(b\) high temporal instability, ensuring that the dynamic allocation targets the largest residual contributors\. To amortize routing overhead,MtM\_\{t\}is refreshed everyRRsteps\.

#### ii\. Optional Global Recall:

To prevent the exact evaluation subgraphs from becoming disconnected from the global context, we compute a low\-cost global background signalgtg\_\{t\}from the omitted interaction regions:

gt=Poolt​\(xt;𝒜∖Mt\),ℛt​\(xt;gt\)=Ut​\(\[xt,gt\]\)\.g\_\{t\}=\\text\{Pool\}\_\{t\}\(x^\{t\};\\mathcal\{A\}\\setminus M\_\{t\}\),\\quad\\mathcal\{R\}\_\{t\}\(x^\{t\};g\_\{t\}\)=U\_\{t\}\(\[x^\{t\},g\_\{t\}\]\)\.\(4\)We implementUtU\_\{t\}as a parameter\-free coverage\-weighted interpolation:Ut​\(\[xt,gt\]\)i=αi​xit\+\(1−αi\)​gt,iU\_\{t\}\(\[x^\{t\},g\_\{t\}\]\)\_\{i\}=\\alpha\_\{i\}x\_\{i\}^\{t\}\+\(1\-\\alpha\_\{i\}\)g\_\{t,i\}, whereαi=di​\(Mt\)/di​\(𝒜\)\\alpha\_\{i\}=d\_\{i\}\(M\_\{t\}\)/d\_\{i\}\(\\mathcal\{A\}\)represents the fraction of nodeii’s interactions evaluated exactly\. This term \(with𝒪​\(N\)\\mathcal\{O\}\(N\)complexity\) acts as a contextual correction that stabilizes the optimization trajectory under tight budgets\. The main experiments evaluate the pure\-LoRe configuration \(with recall disabled\) to isolate the direct routing contribution; however, this parameter\-free, optional stabilizer \(requiring only one cached tensor and no retraining\) can be seamlessly enabled to further smooth trajectories in ultra\-low budget regimes\.

#### iii\. Fixed Projection/Repair and Greedy Decoding:

We keep the post\-processing operatorΠt\\Pi\_\{t\}\(including decoding and feasibility repair\) completely identical across all compared methods to ensure a rigorous and fair evaluation\. In our target tasks, decoding and repair are designed to be standard and lightweight \(consisting of a greedy pass followed by task\-specific validity checks\)\. Consequently, the recorded end\-to\-end acceleration stems entirely from alleviating the expensive interaction\-evaluation bottleneck at the runtime level, rather than altering post\-processing complexity\.

Algorithm 1LoRe Inference \(MIS\)0:graph

G=\(V,E\)G\{=\}\(V,E\), pretrained model

ℳθ\\mathcal\{M\}\_\{\\theta\}, budget ratio

ρ\\rho, skeleton ratio

γ\\gamma, refresh interval

RR, total steps

TT
1:Precompute \(run once\):

2:

xT∼𝒩​\(0,In\);xprev←xTx^\{T\}\\sim\\mathcal\{N\}\(0,I\_\{n\}\);\\quad x\_\{\\mathrm\{prev\}\}\\leftarrow x^\{T\}
3:

B←⌊ρ​\|E\|⌋B\\leftarrow\\lfloor\\rho\|E\|\\rfloor
4:

Eskel←Top⌊γ​B⌋​\{deg⁡\(i\)\+deg⁡\(j\):\(i,j\)∈E\}E\_\{\\mathrm\{skel\}\}\\leftarrow\\mathrm\{Top\}\_\{\\lfloor\\gamma B\\rfloor\}\\bigl\\\{\\deg\(i\)\{\+\}\\deg\(j\)\\;:\\;\(i,j\)\\in E\\bigr\\\}
5:Sampling \(per steptt\):

6:for

t=T,T−1,…,1t=T,T\-1,\\ldots,1do

7:if

t=Tt=Tor

tmodR=0t\\bmod R=0then

8:

Ehot←TopB−\|Eskel\|​\{St​\(e;xt,xprev\):e∈E∖Eskel\}E\_\{\\mathrm\{hot\}\}\\leftarrow\\mathrm\{Top\}\_\{B\-\|E\_\{\\mathrm\{skel\}\}\|\}\\bigl\\\{S\_\{t\}\(e;x^\{t\},x\_\{\\mathrm\{prev\}\}\)\\;:\\;e\\in E\\setminus E\_\{\\mathrm\{skel\}\}\\bigr\\\}⊳\\trianglerightStS\_\{t\}: see §[3\.4](https://arxiv.org/html/2605.29005#S3.SS4)

9:

Mt←Eskel∪EhotM\_\{t\}\\leftarrow E\_\{\\mathrm\{skel\}\}\\cup E\_\{\\mathrm\{hot\}\}
10:else

11:

Mt←Mt\+1M\_\{t\}\\leftarrow M\_\{t\+1\}
12:endif

13:

ϵt←ℳθ​\(xt,t,Mt\)\\epsilon^\{t\}\\leftarrow\\mathcal\{M\}\_\{\\theta\}\(x^\{t\},t,M\_\{t\}\)
14:

xprev←xtx\_\{\\mathrm\{prev\}\}\\leftarrow x^\{t\}
15:

xt−1∼q​\(xt−1∣xt,ϵt\)x^\{t\-1\}\\sim q\(x^\{t\-1\}\\mid x^\{t\},\\epsilon^\{t\}\)
16:endfor

17:return

GreedyDecode​\(x0,G\)\\mathrm\{GreedyDecode\}\(x^\{0\},G\)

Algorithm[1](https://arxiv.org/html/2605.29005#alg1)summarizes the MIS instantiation; the TSP variant is described in Section[3\.4](https://arxiv.org/html/2605.29005#S3.SS4)below\.

#### Error bound \(informal\)\.

Under a local Lipschitz assumption on the composed mapΠt∘𝒯t\\Pi\_\{t\}\\circ\\mathcal\{T\}\_\{t\}, the trajectory erroret=‖x~t−xt‖e\_\{t\}=\\\|\\tilde\{x\}^\{t\}\-x^\{t\}\\\|between the budgeted and full updates satisfies

et\+1≤Lt⋅et\+‖δt‖,e\_\{t\+1\}\\leq L\_\{t\}\\cdot e\_\{t\}\+\\\|\\delta\_\{t\}\\\|,\(5\)where the per\-layer Lipschitz factor satisfiesLt≤‖A‖2​‖Wt‖2L\_\{t\}\\leq\\\|A\\\|\_\{2\}\\\|W\_\{t\}\\\|\_\{2\}, depending on graph structure through the adjacency spectral norm \(bounded byΔmax\\Delta\_\{\\max\}for unweighted graphs\), and the per\-step residual decomposes via the triangle inequality as‖δt‖≤ϵt​\(ρ\)\+‖rt‖\\\|\\delta\_\{t\}\\\|\\leq\\epsilon\_\{t\}\(\\rho\)\+\\\|r\_\{t\}\\\|withϵt​\(ρ\)\\epsilon\_\{t\}\(\\rho\)the omitted\-message mass and‖rt‖\\\|r\_\{t\}\\\|the recall approximation error\. LoRe’s routing keepsϵt​\(ρ\)\\epsilon\_\{t\}\(\\rho\)tight by sending high\-impact interactions to the Cluster; empirically, the omitted \(Bath\) interactions remain near\-certain throughout the sampling trajectory, soϵt​\(ρ\)\\epsilon\_\{t\}\(\\rho\)stays negligible andete\_\{t\}does not accumulate appreciably \(full derivation and measurements in Appendix[A](https://arxiv.org/html/2605.29005#A1)\)\.

### 3\.4Task instantiations

We instantiate𝒜\\mathcal\{A\}, score proxiesStS\_\{t\}, and recall pooling for the tasks studied\.

#### MIS \(primary\)\.

GivenG=\(V,E\)G=\(V,E\), set𝒜=E\\mathcal\{A\}=Eand representxt∈\[0,1\]\|V\|x^\{t\}\\in\[0,1\]^\{\|V\|\}as relaxed vertex indicators\. We instantiate edge scores combining \(i\) endpoint uncertainty and \(ii\) temporal instability; one canonical form is

St​\(\(i,j\);xt,xprev\)=ui​uj\\displaystyle S\_\{t\}\\big\(\(i,j\);x^\{t\},x\_\{\\mathrm\{prev\}\}\\big\)=u\_\{i\}\\,u\_\{j\}\(6\)\+λstab​\(\|xit−xprev,i\|\+\|xjt−xprev,j\|\),\\displaystyle\\quad\+\\lambda\_\{\\mathrm\{stab\}\}\\left\(\\lvert x^\{t\}\_\{i\}\-x\_\{\\mathrm\{prev\},i\}\\rvert\+\\lvert x^\{t\}\_\{j\}\-x\_\{\\mathrm\{prev\},j\}\\rvert\\right\),where the node uncertaintyui=1−\|2​xit−1\|u\_\{i\}=1\-\\lvert 2x^\{t\}\_\{i\}\-1\\rvertis maximal atxit=12x^\{t\}\_\{i\}=\\tfrac\{1\}\{2\}\(an undecided vertex\), andxprevx\_\{\\mathrm\{prev\}\}is the state from the previous refinement step\. This score isO​\(\|E\|\)O\(\|E\|\)to compute at refresh steps and highlights interactions that would otherwise cause truncation spikes\. Framework\-specific instantiations may use closely related node\-level proxies based on the same two factors\. Projection/repair uses the same greedy decode routine across all compared variants\.

#### TSP \(secondary\)\.

Let𝒜\\mathcal\{A\}be the interaction set used by the backbone \(dense candidates or a pre\-constructed candidate graph\)\. LoRe routes expensive evaluation to a time\-varying subsetMt⊆𝒜M\_\{t\}\\subseteq\\mathcal\{A\}without modifying𝒜\\mathcal\{A\}itself, highlighting the distinction between temporal operator sparsity \(runtime routing\) and static spatial sparsification \(candidate graphs\)\. Decoding/repair is kept identical across variants \(greedy decode plus standard tour repair, with 2\-opt where applicable\)\.

## 4Experiments

We evaluateLoReas a training\-free inference\-time wrapper that enforces a hard per\-step budget on interaction evaluations\. Unless stated otherwise, all methods use the same DIFUSCO implementation and pretrained checkpoints, with identical refinement horizon and decoding/repair; LoRe modifies only the per\-step active interaction set via runtime routing\. All reported wall\-clock time and peak GPU memory follow a fully inclusive end\-to\-end accounting protocol, covering diffusion steps and post\-processing\.

Experiments run on Ubuntu 24\.04 with dual AMD EPYC 9654 CPUs and NVIDIA RTX PRO 6000 GPU \(96GB\)\. When randomness is present, we repeat runs with different seeds and report mean±\\pmstd\.

#### Roadmap and central hypothesis\.

Our central hypothesis is that enforcing a hard per\-step interaction budget can improve end\-to\-end efficiency*at scale*while preserving stable inference behavior, provided that the evaluated support is re\-routed as the state evolves\. Table[2](https://arxiv.org/html/2605.29005#S4.T2)summarizes the results across tasks\. We build evidence for this claim in three stages:

- •Scalability and Feasibility \(Secs\.[4\.1](https://arxiv.org/html/2605.29005#S4.SS1)and[4\.2](https://arxiv.org/html/2605.29005#S4.SS2)\)\.We quantify scaling on large MIS instances, explicitly probing the baseline out\-of\-memory \(OOM\) boundary, and verify cross\-task generalization on TSP\.
- •Mechanism and Necessity \(Sec\.[4\.3](https://arxiv.org/html/2605.29005#S4.SS3)\)\.We isolate routing under matched per\-step budgets to test the necessity of state\-dependent re\-routing, contrasting dynamic versus static supports and greedy variants\.
- •Deployment Behavior \(Secs\.[4\.4](https://arxiv.org/html/2605.29005#S4.SS4)and[4\.5](https://arxiv.org/html/2605.29005#S4.SS5)\)\.We characterize sensitivity to LoRe’s hyperparameters under a single untuned configuration and evaluate robustness to topology shift without retraining\.

Table 2:Main Results: Scalability, Robustness, and Cross\-Framework Generality\.Fully inclusive end\-to\-end wall\-clock time and peak GPU memory\.NN: number of evaluation runs\. Speedup==mean of per\-instance time ratios \(base/LoRe\); Retention==ratio of per\-instance solution\-quality scores \(LoRe/base for MIS set size; base/LoRe for TSP tour length;≥1\\geq 1means LoRe matches or beats the baseline in either case\); both reported as mean±\\pmstd over theNNinstances\. Mem↓\\downarrow: base/LoRe peak\-memory ratio\.oom: baseline exceeds the 96 GB GPU \(relative metrics undefined\)\.Fw\.TaskSettingTime \(s\) LoRe/baseMem \(GB\) LoRe/baseMem↓\\downarrowNSpeedup \(×\\times\)RetentionDIFUSCO — MIS: size scaling and OOM boundary \(ER,p=0\.05p\{=\}0\.05\)DIFUSCOMISn=1n\{=\}1k7\.9 / 17\.30\.07 / 0\.425\.7×\\times52\.19±\\pm0\.030\.815±\\pm0\.048DIFUSCOMISn=2n\{=\}2k10\.8 / 69\.30\.18 / 1\.588\.6×\\times56\.42±\\pm0\.040\.804±\\pm0\.034DIFUSCOMISn=3n\{=\}3k18\.6 / 1490\.35 / 3\.5110\.0×\\times58\.03±\\pm0\.030\.835±\\pm0\.017DIFUSCOMISn=5n\{=\}5k52 / 4060\.88 / 9\.6811\.0×\\times57\.79±\\pm0\.071\.029±\\pm0\.031DIFUSCOMISn=8n\{=\}8k124 / 10302\.15 / 24\.711\.5×\\times58\.28±\\pm0\.121\.019±\\pm0\.014DIFUSCOMISn=10n\{=\}10k196 / 16013\.31 / 38\.611\.7×\\times58\.16±\\pm0\.140\.991±\\pm0\.044DIFUSCOMISn=15n\{=\}15k442 / 36047\.32 / 86\.711\.9×\\times38\.16±\\pm0\.041\.010±\\pm0\.013DIFUSCOMISn=20n\{=\}20k767 /oom12\.9 /oom–3––DIFUSCOMISn=30n\{=\}30k1782 /oom28\.8 /oom–3––DIFUSCOMISn=50n\{=\}50k4949 /oom79\.5 /oom–3––DIFUSCO — TSP \(dense\-graph mode; dense\-trainedn=100n\{=\}100checkpoint\): scaling to largernn\(merge \+ 2\-opt; end\-to\-end\)DIFUSCOTSPn=100n\{=\}1000\.61 / 0\.340\.03 / 0\.082\.7×\\times200\.55±\\pm0\.050\.936±\\pm0\.021DIFUSCOTSPn=500n\{=\}5000\.72 / 3\.610\.05 / 1\.2324\.6×\\times205\.10±\\pm0\.390\.953±\\pm0\.014DIFUSCOTSPn=1n\{=\}1k0\.94 / 13\.60\.11 / 4\.8343\.9×\\times2014\.61±\\pm1\.070\.960±\\pm0\.006DIFUSCOTSPn=2n\{=\}2k2\.14 / 56\.70\.35 / 19\.555\.6×\\times1526\.66±\\pm2\.120\.974±\\pm0\.008DIFUSCOTSPn=3n\{=\}3k4\.42 / 1250\.73 / 43\.459\.4×\\times1528\.46±\\pm1\.950\.983±\\pm0\.005DIFUSCOTSPn=4n\{=\}4k7\.92 / 2281\.28 / 77\.060\.1×\\times528\.84±\\pm0\.311\.008±\\pm0\.004DIFUSCOTSPn=5n\{=\}5k12\.7 /oom1\.99 /oom–5––DIFUSCOTSPn=10n\{=\}10k50\.6 /oom7\.85 /oom–5––DIFUSCOTSPn=30n\{=\}30k492 /oom70\.4 /oom–3––DIFUSCO — MIS: topology\-OOD generalization \(n=2n\{=\}2k, 100 graph–budget evaluations/family\)DIFUSCOMISER→\\toER1\.08 / 8\.580\.39 / 4\.0412\.4×\\times1008\.67±\\pm2\.440\.991±\\pm0\.053DIFUSCOMISER→\\toBA1\.05 / 8\.070\.37 / 3\.7812\.4×\\times1008\.25±\\pm2\.160\.756±\\pm0\.051DIFUSCOMISER→\\toWS1\.12 / 8\.570\.39 / 4\.0412\.4×\\times1008\.31±\\pm2\.241\.002±\\pm0\.048Cross\-framework: LoRe with framework\-specific routing adaptations, no backbone retrainingT2TCOMISn=5n\{=\}5k \(size\-OOD\)13\.0 / 60\.12\.49 / 29\.211\.7×\\times304\.62±\\pm0\.020\.989±\\pm0\.083T2TCOTSPn=1n\{=\}1k1\.20 / 13\.770\.136 / 4\.8335\.6×\\times3011\.49±\\pm0\.680\.994±\\pm0\.011COExpanderMISV=2\.4V\{=\}2\.4k0\.60 / 3\.920\.49 / 5\.1510\.4×\\times306\.54±\\pm0\.560\.872±\\pm0\.033COExpanderMVCV=2\.4V\{=\}2\.4k0\.44 / 2\.930\.49 / 5\.1310\.4×\\times306\.60±\\pm0\.721\.001±\\pm0\.001

![Refer to caption](https://arxiv.org/html/2605.29005v1/x3.png)Figure 3:Scaling and cross\-framework performance\.\(a\) MIS: runtime and peak GPU memory vs\. graph size\. \(b\) TSP: same axes\. \(c\) Speedup vs\. graph size across multiple framework×\\timestask combinations\. Red region marks baseline OOM\.
### 4\.1Scalability and Memory Feasibility on Large MIS Instances

We investigate whether enforcing a hard per\-step interaction budget effectively improves end\-to\-end scaling and extends memory feasibility as graph size increases\. Following the accounting protocol in Sec\.[4](https://arxiv.org/html/2605.29005#S4), we sweepnnto explicitly probe the baseline out\-of\-memory \(OOM\) boundary, reporting both wall\-clock time and peak GPU memory \(Fig\.[3](https://arxiv.org/html/2605.29005#S4.F3)\(a\)\)\. Since aggregate metrics are provided in Table[2](https://arxiv.org/html/2605.29005#S4.T2), we focus here on the scaling trends and feasibility limits\.

Figure[3](https://arxiv.org/html/2605.29005#S4.F3)\(a\) illustrates divergent scaling trends in both memory and runtime\. The full\-support baseline rapidly saturates memory, encountering OOM atn≈20n\{\\approx\}20k nodes \(its largest feasible scale isn=15n\{=\}15k\), whereas LoRe remains feasible up ton=50n\{=\}50k \(Table[2](https://arxiv.org/html/2605.29005#S4.T2)\)\. Runtime performance exhibits a similar divergence: speedup increases with instance size, rising from∼2×\{\\sim\}2\\times\(1k\) to∼8×\{\\sim\}8\\times\(10k–15k\)\. This confirms that the routing overhead is effectively amortized as the dominant interaction\-evaluation workload is reduced by the per\-step budget\. Notably, at 15k nodes \(the largest feasible baseline scale\), LoRe achieves an8\.16×8\.16\\timesend\-to\-end speedup with1\.011\.01retention while reducing peak memory by∼12×\{\\sim\}12\\times\(86\.786\.7\\,GB→7\.32\\to 7\.32\\,GB, see Table[2](https://arxiv.org/html/2605.29005#S4.T2)\)\.

In summary, these results validate the scalability hypothesis: bounding expensive interaction evaluations transforms the scaling profile, improving practical feasibility and efficiency at scale rather than merely yielding a constant\-factor acceleration\.

### 4\.2Cross\-Task Scaling on TSP

We further validate that the proposed budgeting mechanism generalizes to a distinct task \(TSP\) without retraining\. Following the same DIFUSCO codebase and identical merge \+ 2\-opt post\-processing, we evaluate TSP in DIFUSCO’s dense\-graph mode \(the full\-support interaction set,sparse\_factor=−1\{=\}\{\-\}1\) using the dense\-trainedn=100n\{=\}100checkpoint\. We scale the instance size from the in\-distribution point \(n=100n\{=\}100\) up ton=4000n\{=\}4000, and additionally probe an OOM\-extension regime \(up ton=30n\{=\}30k\) where the dense baseline is infeasible\. As shown in Table[2](https://arxiv.org/html/2605.29005#S4.T2)and Figure[3](https://arxiv.org/html/2605.29005#S4.F3)\(b\), the efficiency gains become increasingly pronounced with graph size, directly aligning with our design goal for large\-scale inference\.

While LoRe introduces minor overhead on small instances \(n=100n\{=\}100\) where the baseline runtime is already negligible, it delivers substantial speedups and near\-saturating memory reductions as the computational burden increases\. Atn=1000n\{=\}1000, LoRe reduces end\-to\-end time from 13\.6s to 0\.94s and peak memory from4\.834\.83\\,GB to0\.110\.11\\,GB, keeping the tour length within a few percent of the baseline \(retention0\.940\.94–1\.011\.01across scales, with the gap narrowing asnngrows\)\. A runtime breakdown confirms the source of these gains: atn=1000n\{=\}1000the baseline spends13\.313\.3s \(98%98\\%of its13\.613\.6s total\) on the diffusion interaction\-evaluation stage, whereas LoRe compresses this stage to0\.570\.57s\. Because the merge \+ 2\-opt post\-processing is shared and identical, the speedup stems directly from alleviating the interaction bottleneck rather than from post\-processing\.

### 4\.3Necessity of State\-Dependent Routing

Controlled Routing Ablation\.We explicitly investigate the routing mechanism to determine whether*state\-dependent re\-routing*is essential under a hard per\-step budget, or if a static sparse support suffices\. To eliminate confounders arising from learning dynamics or checkpoint variance, we conduct a controlled ablation using a training\-free iterative relaxation \(conflict\-descent style\) on MIS instances, varying solely the edge selection rule\.

We evaluate on 40 graphs: 20 Erdős–Rényi \(ER,p=0\.05p\{=\}0\.05\) and 20 Barabási–Albert \(BA,m=3m\{=\}3\) withn∈\{500,1000\}n\\in\\\{500,1000\\\}\. To ensure a rigorous comparison, all strategies share the same horizon \(T=100T\{=\}100\), per\-step budget ratio \(ρ=0\.08\\rho\{=\}0\.08\), and decoding/repair procedures\. Furthermore, each instance uses an identical initializationx\(0\)∼Uniform​\(0\.25,0\.75\)x^\{\(0\)\}\\\!\\sim\\\!\\mathrm\{Uniform\}\(0\.25,0\.75\)shared across all variants\.

Table 3:Controlled Routing Strategies\.All variants enforce a strict per\-step budget of⌊ρ​\|E\|⌋\\lfloor\\rho\|E\|\\rfloor; dynamic variants refresh the support everyRRsteps\.VariantSelection RuleRefreshLoRestatic skeleton \+ dynamic hotspotseveryRRGreedy\-Conflicttop edges by conflictxi​xjx\_\{i\}x\_\{j\}everyRRGreedy\-Deg\-Dyntop edges by\(degi\+degj\)​\(xi\+xj\)\(\\deg\_\{i\}\{\+\}\\deg\_\{j\}\)\(x\_\{i\}\{\+\}x\_\{j\}\)everyRRLoRe\-StaticLoReatt=0t\{=\}0, then fixedneverGreedy\-Degreetop edges by degree sumdegi\+degj\\deg\_\{i\}\{\+\}\\deg\_\{j\}neverRandomuniform edge samplingeveryRRFigure[4](https://arxiv.org/html/2605.29005#S4.F4)demonstrates that dynamic re\-routing consistently outperforms static supports under matched budgets\.LoReachieves rapid convergence in full\-graph soft conflict energy𝒞​\(x\)\\mathcal\{C\}\(x\)and identifies repairable solutions early\. In contrast,LoRe\-Staticexhibits early saturation, often failing to yield feasible solutions\. Crucially, among dynamic strategies,LoReis more stable than purely dynamic greedy edge\-wise selection\. This underscores the advantage of*combining a static structural backbone with dynamic hotspots*, rather than relying on either component alone\.

To further quantify why static masks fail, we track the temporal evolution of the selected edge setMtM\_\{t\}\. Under the default refresh intervalR=10R=10, the overlap fraction\|Mt∩Mt\+R\|/\|Mt\|\|M\_\{t\}\\cap M\_\{t\+R\}\|/\|M\_\{t\}\|between selected sets at consecutive refreshes is only25\.7%25\.7\\%during the early diffusion stage \(steps 1–10\), reflecting a highly non\-stationary conflict frontier as the global solution structure forms\. This overlap increases to56\.4%56\.4\\%in later refinement steps \(steps 31–50\), while remaining far from complete stability\. This persistent temporal variation is consistent with the severe quality degradation of static support relative to LoRe’s dynamic adaptation\.

Collectively, these results confirm that the performance gains stem principally from the dynamic adaptation of the evaluation support as the state evolves\.

![Refer to caption](https://arxiv.org/html/2605.29005v1/x4.png)Figure 4:Dynamic vs\. Static Supports under Matched Budgets\.\(a\) Full\-graph soft conflict energy𝒞​\(x\)\\mathcal\{C\}\(x\)\. \(b\) Anytime legal solution size after repair\. Curves are averaged over 40 graphs; routing variants are detailed in Table[3](https://arxiv.org/html/2605.29005#S4.T3)\.
### 4\.4Hyperparameter Sensitivity

Table 4:Hyperparameter Sensitivity\.A single default \(ρ=0\.08\\rho\{=\}0\.08,R=10R\{=\}10,γ=0\.05\\gamma\{=\}0\.05,λstab=0\.5\\lambda\_\{\\mathrm\{stab\}\}\{=\}0\.5\) is used*without tuning*across all frameworks, tasks, and configs\. We sweep the proxy\-score weightλstab∈\[0,5\.0\]\\lambda\_\{\\mathrm\{stab\}\}\\\!\\in\\\!\[0,5\.0\]across graph densities \(DIFUSCO MIS,n=5000n\{=\}5000, 3 graphs/cell,ρ=0\.08\\rho\{=\}0\.08\); each cell is mean quality retention \(%\)\.λstab\\lambda\_\{\\mathrm\{stab\}\}p=0\.05p\{=\}0\.05p=0\.10p\{=\}0\.10p=0\.15p\{=\}0\.15099\.799\.7105\.1105\.196\.896\.80\.250\.2599\.199\.1104\.0104\.097\.697\.60\.50\.599\.199\.1105\.1105\.197\.697\.61\.01\.098\.898\.8103\.4103\.497\.697\.62\.02\.098\.898\.8105\.7105\.798\.498\.45\.05\.099\.799\.7102\.9102\.996\.896\.8Range0\.90\.9pp2\.82\.8pp1\.61\.6ppSpeedup6\.3×6\.3\\times6\.6×6\.6\\times6\.6×6\.6\\timesMem↓\\downarrow11×11\\times12×12\\times12×12\\timesWe evaluate whether LoRe’s default hyperparameters require per\-setting tuning\. Across all frameworks, tasks, graph densities, and problem scales considered in this paper, we use a single configuration \(ρ=0\.08\\rho\{=\}0\.08,R=10R\{=\}10,γ=0\.05\\gamma\{=\}0\.05,λstab=0\.5\\lambda\_\{\\mathrm\{stab\}\}\{=\}0\.5\) without task\-specific tuning\. To characterize the effect of each hyperparameter, we sweep them independently; Table[4](https://arxiv.org/html/2605.29005#S4.T4)reports the sweep overλstab∈\[0,5\.0\]\\lambda\_\{\\mathrm\{stab\}\}\\in\[0,5\.0\]across graph densities atn=5000n\{=\}5000\.

Retention varies by at most2\.82\.8pp acrossλstab\\lambda\_\{\\mathrm\{stab\}\}at every density, while the speedup remains6–7×6\\text\{\-\-\}7\\timesand the peak\-memory reduction remains11–12×11\\text\{\-\-\}12\\times\. Independent sweeps of the interaction budgetρ\\rhoand refresh intervalRRshow similar robustness, with retention varying by less than33pp overρ∈\[0\.05,0\.50\]\\rho\\in\[0\.05,0\.50\]and less than1\.51\.5pp overR∈\[1,50\]R\\in\[1,50\]\.

The few deviations are task\- or backbone\-dependent and consistent with the role of each hyperparameter\. In particular, categorical\-diffusion backbones can be more sensitive to infrequent refreshes\.

Overall, LoRe is largely insensitive to its hyperparameters\.

The default configuration transfers across settings without tuning, and when adjustment is needed the guideline is simple: a larger budgetρ\\rhofor constraint\-dense tasks, and a shorter refresh intervalRRfor categorical\-diffusion backbones\.

### 4\.5Zero\-Shot Robustness to Topology Shifts

We finally assess thezero\-shot generalizationof LoRe: can the efficiency gains persist under topology shifts without retraining the backbone? Using the ER\-pretrained model, we evaluate Erdős–Rényi \(ER\), Barabási–Albert \(BA\), and Watts–Strogatz \(WS\) families over 100 graph–budget evaluations per family \(Table[2](https://arxiv.org/html/2605.29005#S4.T2), topology\-OOD rows\)\. LoRe delivers consistent∼8×\{\\sim\}8\\timesacceleration and∼12×\{\\sim\}12\\timespeak\-memory reduction across all three families, with near\-perfect retention on ER and WS \(0\.990\.99and1\.001\.00\); the only notable degradation is on the scale\-free BA family \(retention0\.760\.76\), where heavy\-tailed degree distributions concentrate conflicts on hub nodes and make sparse budget allocation more challenging\.

### Experimental Takeaways

Our experimental results validate LoRe as a scalable, training\-free inference accelerator\.

- •Scalability:LoRe breaks the memory bottleneck of full\-graph diffusion, extending feasibility to large instances where the baseline fails \(OOM\), while delivering increasing speedups\.
- •Mechanism:Controlled ablations confirm that dynamic, state\-dependent re\-routing is the key driver of performance, consistently outperforming static or greedy baselines\.
- •Robustness:The method generalizes zero\-shot to new tasks \(TSP\) and topologies \(OOD\) and offers a controllable speed–quality trade\-off via the budget parameter\.

## 5Conclusion

We formalize*per\-step operator budgeting*for iterative neural solvers and cast it as a*runtime routing*problem\. We presentLoRe, a training\-free, inference\-time drop\-in wrapper that enforces*temporal operator sparsity*via a time\-varying active interaction set, leaving backbone checkpoints and decoding unchanged\. Experiments show improved practicality at scale: LoRe reduces peak memory, shifts the MIS feasibility boundary beyond baseline OOM, and transfers to TSP with a scale\-dependent crossover and bounded quality impact\. Future work includes tighter stability/accuracy characterizations of budget\-induced truncation and extending budgeted routing to broader solver families and tasks\.

## Acknowledgments

This work was supported by National Natural Science Foundation of China \( U25A6009, 92265207, 12247168\), MOST of China \(2025YFE0217600\), China Postdoctoral Science Foundation \(2022TQ0036\)\. We also thank the anonymous reviewers for their insightful comments, and Chang Liu for valuable discussions\.

## Impact Statement

This paper presents work whose goal is to advance the field of machine learning\. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here\.

## References

- I\. Bello, H\. Pham, Q\. V\. Le, M\. Norouzi, and S\. Bengio \(2017\)Neural combinatorial optimization with reinforcement learning\.In5th International Conference on Learning Representations, ICLR 2017 Workshop Track,Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p1.1)\.
- Y\. Bengio, A\. Lodi, and A\. Prouvost \(2021\)Machine learning for combinatorial optimization: a methodological tour d’horizon\.European Journal of Operational Research290\(2\),pp\. 405–421\.External Links:ISSN 0377\-2217,[Document](https://dx.doi.org/10.1016/j.ejor.2020.07.063)Cited by:[§1](https://arxiv.org/html/2605.29005#S1.p2.3),[§2](https://arxiv.org/html/2605.29005#S2.p1.1)\.
- G\. Biroli, O\. Parcollet, and G\. Kotliar \(2004\)Cluster dynamical mean\-field theories: causality and classical limit\.Physical Review B69\(20\),pp\. 205108\.External Links:ISSN 1550\-235X,[Document](https://dx.doi.org/10.1103/physrevb.69.205108)Cited by:[§3\.2](https://arxiv.org/html/2605.29005#S3.SS2.p1.1)\.
- Q\. Cappart, D\. Chételat, E\. B\. Khalil, A\. Lodi, C\. Morris, and P\. Veličković \(2023\)Combinatorial optimization and reasoning with graph neural networks\.Journal of Machine Learning Research24\(130\),pp\. 1–61\.External Links:[Link](http://jmlr.org/papers/v24/21-0449.html)Cited by:[§1](https://arxiv.org/html/2605.29005#S1.p2.3),[§2](https://arxiv.org/html/2605.29005#S2.p1.1)\.
- H\. Dai, B\. Dai, and L\. Song \(2016\)Discriminative embeddings of latent variable models for structured data\.InProceedings of the 33rd International Conference on Machine Learning,M\. Balcan and K\. Q\. Weinberger \(Eds\.\),Proceedings of Machine Learning Research, Vol\.48,pp\. 2702–2711\.External Links:[Link](http://proceedings.mlr.press/v48/daib16.html)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p8.1)\.
- S\. Feng, Z\. Sun, and Y\. Yang \(2024\)DIFUSCO\-LNS: diffusion\-guided large neighbourhood search for integer linear programming\.Note:OpenReview preprintSubmitted to ICLR 2024External Links:[Link](https://openreview.net/forum?id=9QV7Q9gKl9)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p6.1)\.
- J\. Gilmer, S\. S\. Schoenholz, P\. F\. Riley, O\. Vinyals, and G\. E\. Dahl \(2017\)Neural message passing for quantum chemistry\.InProceedings of the 34th International Conference on Machine Learning,D\. Precup and Y\. W\. Teh \(Eds\.\),Proceedings of Machine Learning Research, Vol\.70,pp\. 1263–1272\.External Links:[Link](https://proceedings.mlr.press/v70/gilmer17a.html)Cited by:[§1](https://arxiv.org/html/2605.29005#S1.p2.3)\.
- W\. Hamilton, Z\. Ying, and J\. Leskovec \(2017\)Inductive representation learning on large graphs\.InAdvances in Neural Information Processing Systems,I\. Guyon, U\. V\. Luxburg, S\. Bengio, H\. Wallach, R\. Fergus, S\. Vishwanathan, and R\. Garnett \(Eds\.\),Vol\.30,pp\.\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2017/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf)Cited by:[§1](https://arxiv.org/html/2605.29005#S1.p2.3)\.
- K\. Helsgaun \(2000\)An effective implementation of the Lin–Kernighan traveling salesman heuristic\.European Journal of Operational Research126\(1\),pp\. 106–130\.External Links:ISSN 0377\-2217,[Document](https://dx.doi.org/10.1016/s0377-2217%2899%2900284-2)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p5.1)\.
- J\. Ho, A\. N\. Jain, and P\. Abbeel \(2020\)Denoising diffusion probabilistic models\.InAdvances in Neural Information Processing Systems,H\. Larochelle, M\. Ranzato, R\. Hadsell, M\.F\. Balcan, and H\. Lin \(Eds\.\),Vol\.33,pp\. 6840–6851\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p4.1)\.
- J\. J\. Hopfield and D\. W\. Tank \(1985\)“Neural” computation of decisions in optimization problems\.Biological Cybernetics52\(3\),pp\. 141–152\.External Links:ISSN 1432\-0770,[Document](https://dx.doi.org/10.1007/BF00339943),[Link](https://doi.org/10.1007/BF00339943)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p8.1)\.
- A\. Hottung and K\. Tierney \(2020\)Neural large neighborhood search for the capacitated vehicle routing problem\.InECAI 2020,pp\. 443–450\.External Links:[Document](https://dx.doi.org/10.3233/faia200124),ISSN 0922\-6389,[Link](http://dx.doi.org/10.3233/FAIA200124)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p6.1)\.
- F\. P\. Kelly, A\. K\. Maulloo, and D\. K\. H\. Tan \(1998\)Rate control for communication networks: shadow prices, proportional fairness and stability\.Journal of the Operational Research Society49\(3\),pp\. 237–252\.External Links:ISSN 1476\-9360,[Document](https://dx.doi.org/10.1057/palgrave.jors.2600523)Cited by:[§1](https://arxiv.org/html/2605.29005#S1.p1.1)\.
- E\. Khalil, H\. Dai, Y\. Zhang, B\. Dilkina, and L\. Song \(2017\)Learning combinatorial optimization algorithms over graphs\.InAdvances in Neural Information Processing Systems,I\. Guyon, U\. V\. Luxburg, S\. Bengio, H\. Wallach, R\. Fergus, S\. Vishwanathan, and R\. Garnett \(Eds\.\),Vol\.30\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2017/file/d9896106ca98d3d05b8cbdf4fd8b13a1-Paper.pdf)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p1.1)\.
- W\. Kool, H\. van Hoof, and M\. Welling \(2019\)Attention, learn to solve routing problems\!\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=ByxBFsRqYm)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p1.1)\.
- G\. Kotliar, S\. Y\. Savrasov, G\. Pálsson, and G\. Biroli \(2001\)Cellular dynamical mean field approach to strongly correlated systems\.Physical Review Letters87\(18\),pp\. 186401\.External Links:[Document](https://dx.doi.org/10.1103/PhysRevLett.87.186401)Cited by:[§3\.2](https://arxiv.org/html/2605.29005#S3.SS2.p1.1)\.
- Y\. Li, L\. Chen, H\. Wang, R\. Wang, and J\. Yan \(2025\)Generation as search operator for test\-time scaling of diffusion\-based combinatorial optimization\.InAdvances in Neural Information Processing Systems,External Links:[Link](https://openreview.net/forum?id=9JM03CQwzC)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p3.1)\.
- Y\. Li, J\. Guo, R\. Wang, and J\. Yan \(2023\)T2T: from distribution learning in training to gradient search in testing for combinatorial optimization\.InAdvances in Neural Information Processing Systems,A\. Oh, T\. Naumann, A\. Globerson, K\. Saenko, M\. Hardt, and S\. Levine \(Eds\.\),Vol\.36,pp\. 50020–50040\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2023/hash/9c93b3cd3bc60c0fe7b0c2d74a2da966-Abstract-Conference.html)Cited by:[Table 1](https://arxiv.org/html/2605.29005#S2.T1),[§2](https://arxiv.org/html/2605.29005#S2.p4.1)\.
- Y\. Li, J\. Guo, R\. Wang, H\. Zha, and J\. Yan \(2024\)Fast T2T: optimization consistency speeds up diffusion\-based training\-to\-testing solving for combinatorial optimization\.InAdvances in Neural Information Processing Systems,A\. Globerson, L\. Mackey, D\. Belgrave, A\. Fan, U\. Paquet, J\. Tomczak, and C\. Zhang \(Eds\.\),Vol\.37,pp\. 30179–30206\.External Links:[Document](https://dx.doi.org/10.52202/079017-0950),[Link](https://proceedings.neurips.cc/paper_files/paper/2024/hash/352b13f01566ae34affacc60e98c16af-Abstract-Conference.html)Cited by:[Table 1](https://arxiv.org/html/2605.29005#S2.T1),[§2](https://arxiv.org/html/2605.29005#S2.p4.1)\.
- S\. Lin and B\. W\. Kernighan \(1973\)An effective heuristic algorithm for the traveling\-salesman problem\.Operations Research21\(2\),pp\. 498–516\.External Links:ISSN 1526\-5463,[Document](https://dx.doi.org/10.1287/opre.21.2.498)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p5.1)\.
- C\. Lu, Y\. Zhou, F\. Bao, J\. Chen, C\. LI, and J\. Zhu \(2022\)DPM\-Solver: a fast ODE solver for diffusion probabilistic model sampling in around 10 steps\.InAdvances in Neural Information Processing Systems,S\. Koyejo, S\. Mohamed, A\. Agarwal, D\. Belgrave, K\. Cho, and A\. Oh \(Eds\.\),Vol\.35,pp\. 5775–5787\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2022/file/260a14acce2a89dad36adc8eefe7c59e-Paper-Conference.pdf)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p4.1)\.
- A\. Lucas \(2014\)Ising formulations of many NP problems\.Frontiers in Physics2,pp\. 5\.External Links:ISSN 2296\-424X,[Document](https://dx.doi.org/10.3389/fphy.2014.00005)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p8.1)\.
- J\. Ma, W\. Pan, Y\. Li, and J\. Yan \(2025\)COExpander: adaptive solution expansion for combinatorial optimization\.InProceedings of the 42nd International Conference on Machine Learning,A\. Singh, M\. Fazel, D\. Hsu, S\. Lacoste\-Julien, F\. Berkenkamp, T\. Maharaj, K\. Wagstaff, and J\. Zhu \(Eds\.\),Proceedings of Machine Learning Research, Vol\.267,pp\. 42130–42164\.External Links:[Link](https://proceedings.mlr.press/v267/ma25r.html)Cited by:[Table 1](https://arxiv.org/html/2605.29005#S2.T1),[§2](https://arxiv.org/html/2605.29005#S2.p3.1)\.
- T\. Maier, M\. Jarrell, T\. Pruschke, and M\. H\. Hettler \(2005\)Quantum cluster theories\.Reviews of Modern Physics77\(3\),pp\. 1027–1080\.External Links:[Document](https://dx.doi.org/10.1103/RevModPhys.77.1027)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p9.1),[§3\.2](https://arxiv.org/html/2605.29005#S3.SS2.p1.1)\.
- N\. Mazyavkina, S\. Sviridov, S\. Ivanov, and E\. Burnaev \(2021\)Reinforcement learning for combinatorial optimization: a survey\.Computers & Operations Research134,pp\. 105400\.External Links:ISSN 0305\-0548,[Document](https://dx.doi.org/10.1016/j.cor.2021.105400)Cited by:[§1](https://arxiv.org/html/2605.29005#S1.p2.3),[§2](https://arxiv.org/html/2605.29005#S2.p1.1)\.
- M\. Nazari, A\. Oroojlooy, L\. Snyder, and M\. Takac \(2018\)Reinforcement learning for solving the vehicle routing problem\.InAdvances in Neural Information Processing Systems,S\. Bengio, H\. Wallach, H\. Larochelle, K\. Grauman, N\. Cesa\-Bianchi, and R\. Garnett \(Eds\.\),Vol\.31,pp\.\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2018/file/9fb4651c05b2ed70fba5afe0b039a550-Paper.pdf)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p1.1)\.
- V\. Pillac, M\. Gendreau, C\. Guéret, and A\. L\. Medaglia \(2013\)A review of dynamic vehicle routing problems\.European Journal of Operational Research225\(1\),pp\. 1–11\.External Links:ISSN 0377\-2217,[Document](https://dx.doi.org/10.1016/j.ejor.2012.08.015)Cited by:[§1](https://arxiv.org/html/2605.29005#S1.p1.1)\.
- D\. Pisinger and S\. Ropke \(2010\)Large neighborhood search\.InHandbook of Metaheuristics,M\. Gendreau and J\. Potvin \(Eds\.\),pp\. 399–419\.External Links:ISBN 9781441916655,[Document](https://dx.doi.org/10.1007/978-1-4419-1665-5%5F13),ISSN 0884\-8289Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p6.1)\.
- M\. Potthoff, M\. Aichhorn, and C\. Dahnken \(2003\)Variational cluster approach to correlated electron systems in low dimensions\.Phys\. Rev\. Lett\.91,pp\. 206402\.External Links:[Document](https://dx.doi.org/10.1103/PhysRevLett.91.206402),[Link](https://link.aps.org/doi/10.1103/PhysRevLett.91.206402)Cited by:[§3\.2](https://arxiv.org/html/2605.29005#S3.SS2.p1.1)\.
- G\. Reinelt \(1991\)TSPLIB—a traveling salesman problem library\.ORSA Journal on Computing3\(4\),pp\. 376–384\.External Links:ISSN 0899\-1499,[Document](https://dx.doi.org/10.1287/ijoc.3.4.376)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p5.1)\.
- S\. Ropke and D\. Pisinger \(2006\)An adaptive large neighborhood search heuristic for the pickup and delivery problem with time windows\.Transportation Science40\(4\),pp\. 455–472\.External Links:ISSN 1526\-5447,[Document](https://dx.doi.org/10.1287/trsc.1050.0135)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p6.1)\.
- T\. Salimans and J\. Ho \(2022\)Progressive distillation for fast sampling of diffusion models\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=TIdIXIpzhoI)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p4.1)\.
- S\. Sanokowski, W\. F\. Berghammer, H\. P\. Wang, M\. Ennemoser, S\. Hochreiter, and S\. Lehner \(2025\)Scalable discrete diffusion samplers: combinatorial optimization and statistical physics\.InThe Thirteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=peNgxpbdxB)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p2.1)\.
- S\. Sanokowski, S\. Hochreiter, and S\. Lehner \(2024\)A diffusion model framework for unsupervised neural combinatorial optimization\.InProceedings of the 41st International Conference on Machine Learning,R\. Salakhutdinov, Z\. Kolter, K\. Heller, A\. Weller, N\. Oliver, J\. Scarlett, and F\. Berkenkamp \(Eds\.\),Proceedings of Machine Learning Research, Vol\.235,pp\. 43346–43367\.External Links:[Link](https://proceedings.mlr.press/v235/sanokowski24a.html)Cited by:[§1](https://arxiv.org/html/2605.29005#S1.p2.3),[§2](https://arxiv.org/html/2605.29005#S2.p2.1)\.
- F\. Scarselli, M\. Gori, A\. C\. Tsoi, M\. Hagenbuchner, and G\. Monfardini \(2009\)The graph neural network model\.IEEE Transactions on Neural Networks20\(1\),pp\. 61–80\.External Links:ISSN 1941\-0093,[Document](https://dx.doi.org/10.1109/tnn.2008.2005605)Cited by:[§1](https://arxiv.org/html/2605.29005#S1.p2.3)\.
- P\. Shaw \(1998\)Using constraint programming and local search methods to solve vehicle routing problems\.InPrinciples and Practice of Constraint Programming – CP98,M\. Maher and J\. Puget \(Eds\.\),Lecture Notes in Computer Science, Vol\.1520,pp\. 417–431\.External Links:[Document](https://dx.doi.org/10.1007/3-540-49481-2%5F30)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p6.1)\.
- J\. Song, C\. Meng, and S\. Ermon \(2021a\)Denoising diffusion implicit models\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=St1giarCHLP)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p4.1)\.
- Y\. Song, J\. Sohl\-Dickstein, D\. P\. Kingma, A\. Kumar, S\. Ermon, and B\. Poole \(2021b\)Score\-based generative modeling through stochastic differential equations\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=PxTIG12RRHS)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p4.1)\.
- Z\. Sun and Y\. Yang \(2023\)DIFUSCO: graph\-based diffusion solvers for combinatorial optimization\.InThirty\-seventh Conference on Neural Information Processing Systems,External Links:[Link](https://openreview.net/forum?id=JV8Ff0lgVV)Cited by:[§1](https://arxiv.org/html/2605.29005#S1.p2.3),[§1](https://arxiv.org/html/2605.29005#S1.p7.1),[Table 1](https://arxiv.org/html/2605.29005#S2.T1),[§2](https://arxiv.org/html/2605.29005#S2.p2.1)\.
- A\. Verma, L\. Pedrosa, M\. Korupolu, D\. Oppenheimer, E\. Tune, and J\. Wilkes \(2015\)Large\-scale cluster management at google with borg\.InProceedings of the Tenth European Conference on Computer Systems,EuroSys ’15,pp\. 1–17\.External Links:[Document](https://dx.doi.org/10.1145/2741948.2741964)Cited by:[§1](https://arxiv.org/html/2605.29005#S1.p1.1)\.
- O\. Vinyals, M\. Fortunato, and N\. Jaitly \(2015\)Pointer networks\.InAdvances in Neural Information Processing Systems,C\. Cortes, N\. Lawrence, D\. Lee, M\. Sugiyama, and R\. Garnett \(Eds\.\),Vol\.28\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2015/file/29921001f2f04bd3baee84a12e98098f-Paper.pdf)Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p1.1)\.
- Y\. Wang, Y\. Li, J\. Yan, and Y\. Chang \(2025\)StruDiCO: structured denoising diffusion with gradient\-free inference\-stage boosting for memory and time efficient combinatorial optimization\.InAdvances in Neural Information Processing Systems,Cited by:[§2](https://arxiv.org/html/2605.29005#S2.p3.1)\.
- H\. Zhao, K\. Yu, Y\. Huang, R\. Yi, C\. Zhu, and K\. Xu \(2025\)DISCO: efficient diffusion solver for large\-scale combinatorial optimization problems\.Graphical Models141,pp\. 101284\.External Links:ISSN 1524\-0703,[Document](https://dx.doi.org/10.1016/j.gmod.2025.101284),[Link](https://www.sciencedirect.com/science/article/pii/S1524070325000311)Cited by:[§1](https://arxiv.org/html/2605.29005#S1.p2.3)\.
- S\. Zilberstein \(1996\)Using anytime algorithms in intelligent systems\.AI Magazine17\(3\),pp\. 73\.External Links:[Document](https://dx.doi.org/10.1609/aimag.v17i3.1232),[Link](https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/1232)Cited by:[§1](https://arxiv.org/html/2605.29005#S1.p1.1)\.

## Appendix APer\-Step Error Bound: Full Derivation and Empirical Validation

This appendix gives the full derivation of the per\-step error bound stated informally in the main text \(Eq\.[5](https://arxiv.org/html/2605.29005#S3.E5)\), together with the trajectory measurements that validate its key assumption\.

### A\.1Full derivation

The error recursion follows from the triangle inequality: at each steptt, inserting the exact operator evaluated at the approximate state,

et\+1\\displaystyle e\_\{t\+1\}=‖F~t​\(x~t;ρ\)−Ft​\(xt\)‖\\displaystyle=\\\|\\tilde\{F\}\_\{t\}\(\\tilde\{x\}^\{t\};\\rho\)\-F\_\{t\}\(x^\{t\}\)\\\|\(7\)≤‖Ft​\(x~t\)−Ft​\(xt\)‖\+‖F~t​\(x~t;ρ\)−Ft​\(x~t\)‖\.\\displaystyle\\leq\\\|F\_\{t\}\(\\tilde\{x\}^\{t\}\)\-F\_\{t\}\(x^\{t\}\)\\\|\+\\\|\\tilde\{F\}\_\{t\}\(\\tilde\{x\}^\{t\};\\rho\)\-F\_\{t\}\(\\tilde\{x\}^\{t\}\)\\\|\.The first term captures propagated historical error; the second is the current\-step truncation residualδt\\delta\_\{t\}\.

#### Dependence on graph structure \(LtL\_\{t\}\)\.

Assuming a11\-Lipschitz activation such as ReLU, a single GNN aggregation layer \(instantiating𝒯t\\mathcal\{T\}\_\{t\}\) computesFt​\(x\)=σ​\(A​x​Wt\)F\_\{t\}\(x\)=\\sigma\(AxW\_\{t\}\)\. By the mean value inequality and matrix norm submultiplicativity,

‖Ft​\(x~t\)−Ft​\(xt\)‖\\displaystyle\\\|F\_\{t\}\(\\tilde\{x\}^\{t\}\)\-F\_\{t\}\(x^\{t\}\)\\\|=‖σ​\(A​x~t​Wt\)−σ​\(A​xt​Wt\)‖\\displaystyle=\\\|\\sigma\(A\\tilde\{x\}^\{t\}W\_\{t\}\)\-\\sigma\(Ax^\{t\}W\_\{t\}\)\\\|\(8\)≤‖A‖2​‖Wt‖2​‖x~t−xt‖=Lt⋅et\.\\displaystyle\\leq\\\|A\\\|\_\{2\}\\\|W\_\{t\}\\\|\_\{2\}\\\|\\tilde\{x\}^\{t\}\-x^\{t\}\\\|=L\_\{t\}\\cdot e\_\{t\}\.ThusLt≤‖Wt‖2​‖A‖2≤‖Wt‖2​ΔmaxL\_\{t\}\\leq\\\|W\_\{t\}\\\|\_\{2\}\\\|A\\\|\_\{2\}\\leq\\\|W\_\{t\}\\\|\_\{2\}\\,\\Delta\_\{\\max\}, whereΔmax\\Delta\_\{\\max\}is the maximum node degree\. This makes the dependence on graph structure explicit: denser regions with higherΔmax\\Delta\_\{\\max\}have a larger Lipschitz constant, leading to stronger error amplification per step\.

#### Dependence on edge budgetρ\\rho\.

LoRe evaluates the ClusterMtM\_\{t\}\(with\|Mt\|=B=⌊ρ​\|𝒜\|⌋\|M\_\{t\}\|=B=\\lfloor\\rho\|\\mathcal\{A\}\|\\rfloor\) exactly\. The truncation residual decomposes via the triangle inequality as

‖δt‖≤∑a∈𝒜∖Mt‖MSGt,a​\(x~t\)−Poolt‖\+‖rt‖=ϵt​\(ρ\)\+‖rt‖,\\\|\\delta\_\{t\}\\\|\\leq\\sum\_\{a\\in\\mathcal\{A\}\\setminus M\_\{t\}\}\\big\\\|\\mathrm\{MSG\}\_\{t,a\}\(\\tilde\{x\}^\{t\}\)\-\\mathrm\{Pool\}\_\{t\}\\big\\\|\+\\\|r\_\{t\}\\\|=\\epsilon\_\{t\}\(\\rho\)\+\\\|r\_\{t\}\\\|,\(9\)whereϵt​\(ρ\)\\epsilon\_\{t\}\(\\rho\)is the omitted\-message mass over𝒜∖Mt\\mathcal\{A\}\\setminus M\_\{t\}and‖rt‖\\\|r\_\{t\}\\\|is the residual contributed by the optional recall mechanism of Eq\.[4](https://arxiv.org/html/2605.29005#S3.E4)\. Asρ\\rhoincreases,\|𝒜∖Mt\|\|\\mathcal\{A\}\\setminus M\_\{t\}\|shrinks, strictly decreasingϵt​\(ρ\)\\epsilon\_\{t\}\(\\rho\)\. As verified empirically in Sec\.[A\.2](https://arxiv.org/html/2605.29005#A1.SS2), the omitted interactions are highly deterministic, keeping‖rt‖\\\|r\_\{t\}\\\|tightly bounded\.

#### Grönwall bound\.

Combining the above yields the recursionet\+1≤Lt​et\+‖δt‖e\_\{t\+1\}\\leq L\_\{t\}e\_\{t\}\+\\\|\\delta\_\{t\}\\\|\. Since both trajectories share the same initialization,e0=0e\_\{0\}=0\. Unrolling withL=maxt⁡LtL=\\max\_\{t\}L\_\{t\}and applying the geometric series,

eT≤∑k=0T−1LT−1−k​‖δk‖≤LT−1L−1​\(ϵmax​\(ρ\)\+rmax\)\.e\_\{T\}\\leq\\sum\_\{k=0\}^\{T\-1\}L^\{\\,T\-1\-k\}\\,\\\|\\delta\_\{k\}\\\|\\leq\\frac\{L^\{T\}\-1\}\{L\-1\}\\big\(\\epsilon\_\{\\max\}\(\\rho\)\+r\_\{\\max\}\\big\)\.\(10\)

### A\.2Empirical validation

We measure the activity of omitted \(Bath\) versus retained \(Cluster\) interactions across the diffusion trajectory on TSP\-100/500/1000 withρ=0\.08\\rho\{=\}0\.08and5050DDIM steps, using three metrics at different levels of the output pipeline:

- •Entropy:Shannon entropy of the edge probability\.
- •Logit var:variance of the pre\-softmax log\-odds, reflecting raw GNN output activity before softmax compression\.
- •Prob var:variance of the post\-softmax probability, the final output used for decoding\.

Table 5:Bath vs\. Cluster interaction activity across the diffusion trajectory \(DIFUSCO TSP,ρ=0\.08\\rho\{=\}0\.08,5050DDIM steps\)\. C==Cluster, B==Bath; each cell reports the range over all5050steps\.ScaleEntropy \(C / B\)Logit var \(C / B\)Prob var \(C / B\)TSP\-1000\.0170\.017–0\.200\.20/2\.9×10−72\.9\{\\times\}10^\{\-7\}42\.942\.9–106106/4\.94\.9–5\.75\.73\.6×10−23\.6\{\\times\}10^\{\-2\}–1\.1×10−11\.1\{\\times\}10^\{\-1\}/1\.1×10−151\.1\{\\times\}10^\{\-15\}TSP\-5000\.0380\.038–0\.130\.13/3\.3×10−73\.3\{\\times\}10^\{\-7\}26\.426\.4–31\.931\.9/3\.93\.9–9\.39\.31\.9×10−21\.9\{\\times\}10^\{\-2\}–4\.3×10−24\.3\{\\times\}10^\{\-2\}/6\.3×10−156\.3\{\\times\}10^\{\-15\}TSP\-10000\.0340\.034–0\.0370\.037/3\.5×10−73\.5\{\\times\}10^\{\-7\}16\.316\.3–16\.916\.9/5\.05\.0–5\.35\.36\.0×10−36\.0\{\\times\}10^\{\-3\}–7\.6×10−37\.6\{\\times\}10^\{\-3\}/9\.5×10−169\.5\{\\times\}10^\{\-16\}Bath interactions maintain near\-zero entropy throughout all5050steps across all scales—the model has already reached a near\-certain decision about each Bath interaction\. Pre\-softmax logit variance confirms this is genuine, not a softmax artifact\. Cluster interactions carry substantial entropy throughout, reflecting unresolved uncertainty that demands exact computation\.

Together, the bound shows the total error depends on the omitted\-message massϵmax​\(ρ\)\\epsilon\_\{\\max\}\(\\rho\)and the recall errorrmaxr\_\{\\max\}, amplified by the structural factorLL; the measurements confirm the omitted interactions are near\-certain throughout the trajectory, which keeps bothϵt​\(ρ\)\\epsilon\_\{t\}\(\\rho\)and‖rt‖\\\|r\_\{t\}\\\|small\.

## Appendix BAbsolute Quality Benchmarks and Classical Solvers

In the main text, we report quality retention relative to the dense backbone to strictly isolate the effect of budgeted inference\. To properly contextualize the absolute performance of the neural solver, we provide supplementary benchmarks against classical strong heuristics \(KaMIS for MIS and Concorde for TSP\)\.

Table 6:Absolute performance benchmarks on large\-scale MIS instances\. The baseline \(BL\) represents the dense DIFUSCO solver\. LoRe successfully extends feasible inference ton=20​kn=20kwhere the baseline triggers Out\-Of\-Memory \(OOM\) on a 96 GB GPU\.Graph Size \(nn\)KaMISBLLoReSpeedupMem \(BL / LoRe\)5​k5k1571081117\.79×7\.79\\times9\.7 GB / 0\.88 GB10​k10k1691261198\.05×8\.05\\times38\.6 GB / 3\.31 GB20​k20k–OOM140–OOM / 12\.9 GBAs demonstrated in Table[6](https://arxiv.org/html/2605.29005#A2.T6), the neural baseline exhibits an out\-of\-distribution \(OOD\) quality gap relative to KaMIS in this large\-scale regime\. These absolute\-quality results are provided for context: LoRe targets the scalability bottleneck of neural inference rather than replacing specialized classical solvers\.

Crucially, at massive scales \(n≥20​kn\\geq 20k\), the dense baseline entirely exhausts the 96 GB memory limit\. LoRe, however, remains fully operational within12\.912\.9GB, delivering a valid independent set of size140140\. Furthermore, on the TSP\-500500task, under a quality\-prioritized configuration, LoRe attains a5\.76%5\.76\\%optimality gap relative to Concorde, within∼1\.8\{\\sim\}1\.8pp of the sparse baseline’s3\.97%3\.97\\%\.

Similar Articles

Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation

arXiv cs.CL

This paper proposes a Mixture of LoRA and Full (MoLF) fine-tuning framework that uses gradient-guided optimizer routing to adaptively switch between LoRA and full fine-tuning. It aims to overcome the structural limitations of relying solely on static adaptation methods by combining the plasticity of full tuning with the regularization of LoRA.

Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization

arXiv cs.LG

This paper introduces Interactive Inverse Reinforcement Learning (IIRL), a framework where a learner actively interacts with an expert to infer reward functions, formulated as a stochastic bi-level optimization problem. The authors propose the BISIRL algorithm, providing convergence guarantees and experimental validation for this interactive learning paradigm.