GLENS: Global Search via Learning from Solver Iterates with Diffusion Models

arXiv cs.LG Papers

Summary

GLENS is a data-efficient global search method that uses diffusion models to generate diverse, high-quality initial guesses for local minima in non-convex optimization problems by leveraging intermediate solver iterates as free data augmentation.

arXiv:2606.00366v1 Announce Type: new Abstract: We consider the problem of generating a large collection of initial guesses for local minima of multimodal non-convex continuous optimization problems. The goal is for these initial guesses to be high-quality (i.e., a numerical solver converges quickly) and diverse (i.e., represent many different local minima). Identifying multiple locally optimal solutions enables flexible downstream decision-making, but typically requires expensive global search. Existing data-driven methods predict initial guesses using only the final converged optima from offline solver runs, which discards information about the local neighborhoods of solutions and limits the available training data. We propose GLENS (Global Search via Learning from Solver Iterates), a data-efficient global search method that leverages intermediate solver iterates as free data augmentation. GLENS consists of two components: a neighborhood structure model that uses diffusion models to learn the local geometry around optima conditioned on problem parameters, and a solver behavior model that learns refinement directions to further guide samples towards nearby optima during diffusion sampling. Experiments on modified non-convex benchmark problems and a two-robot obstacle-avoidance navigation problem show that GLENS generates high-quality initial guesses while preserving the multimodal distribution of diverse local optima. The resulting initial guesses lead to faster solver convergence across different problem settings and solvers. We also analyze how key hyperparameter choices affect the performance.
Original Article
View Cached Full Text

Cached at: 06/02/26, 03:41 PM

# GLENS: Global Search via Learning from Solver Iterates with Diffusion Models
Source: [https://arxiv.org/html/2606.00366](https://arxiv.org/html/2606.00366)
\[1\]\\fnmAnjian\\surLi

\[1\]\\orgdivDepartment of Electrical and Computer Engineering,\\orgnamePrinceton University,\\orgaddress\\cityPrinceton,\\postcode08544,\\stateNJ,\\countryUSA

2\]\\orgdivDepartment of Operations Research and Financial Engineering,\\orgnamePrinceton University,\\orgaddress\\cityPrinceton,\\postcode08544,\\stateNJ,\\countryUSA

3\]\\orgdivDepartment of Mechanical and Aerospace Engineering,\\orgnamePrinceton University,\\orgaddress\\cityPrinceton,\\postcode08544,\\stateNJ,\\countryUSA

###### Abstract

We consider the problem of generating a large collection of initial guesses for local minima of multimodal non\-convex continuous optimization problems\. The goal is for these initial guesses to be high\-quality \(i\.e\., a numerical solver converges quickly\) and diverse \(i\.e\., represent many different local minima\)\. Identifying multiple locally optimal solutions enables flexible downstream decision\-making, but typically requires expensive global search\. Existing data\-driven methods predict initial guesses using only the final converged optima from offline solver runs, which discards information about the local neighborhoods of solutions and limits the available training data\. We propose GLENS \(Global Search viaLearning fromSolver Iterates\), a data\-efficient global search method that leverages intermediate solver iterates as free data augmentation\. GLENS consists of two components: a neighborhood structure model that uses diffusion models to learn the local geometry around optima conditioned on problem parameters, and a solver behavior model that learns refinement directions to further guide samples towards nearby optima during diffusion sampling\. Experiments on modified non\-convex benchmark problems and a two\-robot obstacle\-avoidance navigation problem show that GLENS generates high\-quality initial guesses while preserving the multimodal distribution of diverse local optima\. The resulting initial guesses lead to faster solver convergence across different problem settings and solvers\. We also analyze how key hyperparameter choices affect the performance\.

###### keywords:

Global Search, Data\-driven Optimization, Non\-convex Optimization, Diffusion Models, Solver Iterates

## 1Introduction

Non\-convex continuous optimization problems arising in real\-world applications often exhibit multiple local optima\. Beyond seeking a single global optimum, it is often desirable to identify a diverse set of locally optimal solutions, enabling trade\-offs among different objectives and requirements\. Ideally, this diverse set of solutions is also high\-quality, meaning that their objective values are reasonably close to the global optimum, which is likely unknowable\. For example, in autonomous robot navigation, multiple feasible trajectories may differ in safety, time efficiency, or comfort\[[1](https://arxiv.org/html/2606.00366#bib.bib1)\]; in spaceflight trajectory design, qualitatively different trajectories may differ in time\-of\-flight, fuel consumption, or flyby sequences\[[2](https://arxiv.org/html/2606.00366#bib.bib2)\]\. Access to such diverse candidate solutions provides greater flexibility for downstream decision\-making\.

However, finding a diverse collection of high\-quality local optima for non\-convex problems typically requires an expensive global search\. A widely used approach is the hybrid method, e\.g\., multi\-start\[[3](https://arxiv.org/html/2606.00366#bib.bib3)\]or Monotonic Basin Hopping \(MBH\)\[[4](https://arxiv.org/html/2606.00366#bib.bib4)\], which combines global sampling with local search\. Multiple initial guesses are sampled from a heuristic distribution over the solution space, and a gradient\-based numerical solver refines each initial guess through a sequence of iterates that converges to a nearby optimum\[[3](https://arxiv.org/html/2606.00366#bib.bib3)\]\. For high\-dimensional and highly non\-convex problems, identifying suitable initial guesses for the numerical solver is increasingly difficult, and the global search quickly becomes computationally expensive\.

An alternative paradigm is amortized optimization\[[5](https://arxiv.org/html/2606.00366#bib.bib5)\], which solves many instances of a parametric optimization problem offline and learns how solutions\[[6](https://arxiv.org/html/2606.00366#bib.bib6)\]or solving strategies\[[7](https://arxiv.org/html/2606.00366#bib.bib7),[8](https://arxiv.org/html/2606.00366#bib.bib8)\]vary with problem parameters, so that high\-quality solutions can be predicted efficiently for new instances at test time\. In this data\-driven setting, multilayer perceptrons \(MLPs\) have been adopted for predicting the optimal solution of convex problems\[[9](https://arxiv.org/html/2606.00366#bib.bib9)\]\. For problems with multiple local optima, generative models such as diffusion models\[[10](https://arxiv.org/html/2606.00366#bib.bib10),[11](https://arxiv.org/html/2606.00366#bib.bib11)\]are used for sampling solution distributions, with applications to mixed\-integer programming\[[12](https://arxiv.org/html/2606.00366#bib.bib12)\]and non\-convex trajectory optimization\[[13](https://arxiv.org/html/2606.00366#bib.bib13),[14](https://arxiv.org/html/2606.00366#bib.bib14)\]\.

Despite their promise, learning\-based methods for non\-convex optimization face a significant challenge: data scarcity\. Generative models require large amounts of diverse, high\-quality training data to learn the relationship between problem parameters and solution distributions\[[15](https://arxiv.org/html/2606.00366#bib.bib15)\]\. Curating such datasets requires running expensive global searches across many problem instances, introducing a substantial computational bottleneck\.

In addition, existing learning\-based amortized optimization methods often collect only the final converged solutions produced by the numerical solvers\[[6](https://arxiv.org/html/2606.00366#bib.bib6),[9](https://arxiv.org/html/2606.00366#bib.bib9),[13](https://arxiv.org/html/2606.00366#bib.bib13)\], while discarding the intermediate iterates generated during the solving process\. Although this practice maintains solution quality, including feasibility and local optimality, it ignores valuable information about the neighborhood structure surrounding optimal solutions\. For example, prior work has shown that in trajectory optimization, locally optimal solutions can lie very close to infeasible regions\[[16](https://arxiv.org/html/2606.00366#bib.bib16),[17](https://arxiv.org/html/2606.00366#bib.bib17)\]\. Without access to information about the solution’s neighborhood structure, data\-driven models may struggle to generalize robustly to variations in constraints or other problem parameters\.

Motivated by these limitations, our key insight is to use intermediate iterates produced by numerical solvers as training data for learning\-based amortized optimization\. Although these iterates are suboptimal, they encode rich information about the local neighborhood surrounding optimal solutions\. Importantly, solver\-generated iterates provide free data augmentation, requiring no additional computational cost beyond the original optimization runs\. However, since intermediate iterates are suboptimal, directly incorporating them without proper modeling can degrade performance\.

![Refer to caption](https://arxiv.org/html/2606.00366v1/x1.png)Figure 1:An illustration of theNeighborhood Structure Model\(left\) andSolver Behavior Model\(right\) in GLENS\. TheNeighborhood Structure Modeluses a diffusion model to learn the local geometry of the locally optimal solutions, conditioned on the scalar radiusrrthat measures the distance from each intermediate iterate to its converged iterate\. TheSolver Behavior Modellearns the refinement direction−∇xE\-\\nabla\_\{x\}Efrom the solver and guides theNeighborhood Structure Modelsamples toward the optima during the reverse diffusion process\.In this paper, we propose GLENS \(Global Search viaLearning fromSolver Iterates\), a data\-driven approach that exploits intermediate iterates to enable robust and data\-efficient global search for non\-convex continuous optimization\. The method consists of two learning\-based components\. TheNeighborhood Structure Modelis a conditional diffusion model trained on the last few iterates prior to convergence, learning the local geometry of optimal solutions and how it varies with problem parameters and relative distance\. TheSolver Behavior Modelis a neural network that predicts solver refinement directions from intermediate solver iterates, mimicking the solver’s refinement behavior with the potential to fast\-forward the solver at runtime\. It provides additional guidance that steers diffusion samples toward high\-quality initial guesses; that is, samples that are close to locally optimal solutions and therefore require fewer solver iterations to reach convergence\.

We illustrate the proposed framework through a sequence of problems with increasing complexity\. We begin with a soft\-constrained quadratic program \(QP\) solved by gradient descent\. This unimodal example provides a controlled setting in which to examine whether GLENS accurately captures local neighborhood structures and their variation with problem parameters\. This choice is motivated by the observation that, in many non\-convex problems, the neighborhood of a local optimum can be well approximated by a constrained quadratic problem\. We then consider several multimodal non\-convex benchmark problems, including modified Himmelblau, Rosenbrock, and Levy functions, using the L\-BFGS\-B solver\[[18](https://arxiv.org/html/2606.00366#bib.bib18)\]\. These examples evaluate the ability of GLENS to learn and sample diverse locally optimal solutions\. We also analyze key hyperparameter choices\. Finally, we apply GLENS to a two\-robot navigation problem with nonlinear dynamics in a multi\-obstacle environment using SNOPT\[[19](https://arxiv.org/html/2606.00366#bib.bib19)\]\.

The paper is organized as follows\. In Section[2](https://arxiv.org/html/2606.00366#S2), we introduce parametric optimization problems and the amortized global search \(AmorGS\) framework\. We also present the diffusion modeling setup\. In Section[3](https://arxiv.org/html/2606.00366#S3), we introduce thekk\-neighborhood dataset definition and present the two\-component learning architecture in GLENS\. It consists of aNeighborhood Structure Modelbased on conditional diffusion models and aSolver Behavior Modelwith a U\-Net architecture\. In Section[4](https://arxiv.org/html/2606.00366#S4), we evaluate the proposed method on several numerical examples: a soft\-constrained QP and several non\-convex benchmark functions, including the modified Himmelblau, Rosenbrock, and Levy functions\. In Section[5](https://arxiv.org/html/2606.00366#S5), we analyze how different hyperparameter choices affect performance\. In Section[6](https://arxiv.org/html/2606.00366#S6), we demonstrate the effectiveness of GLENS on a two\-robot navigation problem\. Finally, Section[7](https://arxiv.org/html/2606.00366#S7)summarizes the contributions and discusses future directions\.

### 1\.1Related work

Global search\.For non\-convex optimization problems arising in real\-world applications, global search is often required to uncover a diverse set of high\-quality local optima\. Evolutionary algorithms, including genetic algorithms\[[20](https://arxiv.org/html/2606.00366#bib.bib20)\]and Differential Evolution\[[21](https://arxiv.org/html/2606.00366#bib.bib21)\], as well as particle swarm optimization\[[22](https://arxiv.org/html/2606.00366#bib.bib22)\], represent a class of population\-based global search methods\. These methods have been successfully applied in space trajectory design\[[23](https://arxiv.org/html/2606.00366#bib.bib23),[24](https://arxiv.org/html/2606.00366#bib.bib24),[25](https://arxiv.org/html/2606.00366#bib.bib25),[26](https://arxiv.org/html/2606.00366#bib.bib26)\], circuit design\[[27](https://arxiv.org/html/2606.00366#bib.bib27),[28](https://arxiv.org/html/2606.00366#bib.bib28)\], and job scheduling\[[29](https://arxiv.org/html/2606.00366#bib.bib29),[30](https://arxiv.org/html/2606.00366#bib.bib30)\]\. Hybrid optimization methods, such as multi\-start strategies\[[31](https://arxiv.org/html/2606.00366#bib.bib31)\], first perform global sampling over the solution space to generate multiple initial guesses and then apply gradient\-based numerical solvers to converge to nearby local optima\[[3](https://arxiv.org/html/2606.00366#bib.bib3)\]\. Building on this idea, MBH\[[32](https://arxiv.org/html/2606.00366#bib.bib32),[4](https://arxiv.org/html/2606.00366#bib.bib4)\]adds random perturbations to local optimization based on the observation that nearby local optima are often connected through funnel\-like energy landscapes\. MBH has been successfully applied in chemical structure optimization\[[33](https://arxiv.org/html/2606.00366#bib.bib33),[34](https://arxiv.org/html/2606.00366#bib.bib34)\], bioinformatics\[[35](https://arxiv.org/html/2606.00366#bib.bib35),[36](https://arxiv.org/html/2606.00366#bib.bib36)\], and space trajectory design\[[2](https://arxiv.org/html/2606.00366#bib.bib2),[37](https://arxiv.org/html/2606.00366#bib.bib37)\], outperforming evolutionary algorithms and particle swarm optimization in certain benchmark settings\[[38](https://arxiv.org/html/2606.00366#bib.bib38)\]\.

Despite their effectiveness, these classical global search methods are often computationally expensive, as identifying suitable initial guesses in high\-dimensional and highly nonlinear spaces remains challenging\. Moreover, such methods typically do not exploit data from previously solved problem instances to learn or reuse structural information about the solution landscape, and therefore must perform global search from scratch when presented with new problems\. This limitation motivates learning\-based approaches that exploit solver\-generated data to amortize the cost of global search across problem instances\.

Machine learning for optimization\.Amortized optimization\[[5](https://arxiv.org/html/2606.00366#bib.bib5)\]learns solution structure from a family of parametric problems offline to accelerate solving new instances online\. For convex problems, this paradigm has proven effective for predicting warm\-starts for QPs\[[9](https://arxiv.org/html/2606.00366#bib.bib9)\]and model predictive control problems with linear dynamics\[[39](https://arxiv.org/html/2606.00366#bib.bib39),[6](https://arxiv.org/html/2606.00366#bib.bib6)\]using MLPs\. For non\-convex optimization, traditional discriminative models, such as decision trees or MLP\-based regressors, have been used to learn solution strategies, for example, by predicting active constraint sets or other structural properties that transform the original problem into a simpler formulation\[[8](https://arxiv.org/html/2606.00366#bib.bib8),[7](https://arxiv.org/html/2606.00366#bib.bib7),[40](https://arxiv.org/html/2606.00366#bib.bib40)\]\. However, learning a single deterministic mapping from problem parameters to solutions is often insufficient for highly nonlinear problems with multiple local optima; thus, recent work has explored generative models for learning conditional distributions over locally optimal solutions\. Graph\-based diffusion models have been proposed for combinatorial optimization problems\[[12](https://arxiv.org/html/2606.00366#bib.bib12),[41](https://arxiv.org/html/2606.00366#bib.bib41)\]\. In the context of non\-convex trajectory optimization, support vector machines\[[42](https://arxiv.org/html/2606.00366#bib.bib42)\], variational autoencoders \(VAEs\)\[[43](https://arxiv.org/html/2606.00366#bib.bib43)\], transformers\[[44](https://arxiv.org/html/2606.00366#bib.bib44)\], and diffusion models\[[13](https://arxiv.org/html/2606.00366#bib.bib13),[17](https://arxiv.org/html/2606.00366#bib.bib17),[45](https://arxiv.org/html/2606.00366#bib.bib45),[14](https://arxiv.org/html/2606.00366#bib.bib14)\]have been used to generate diverse and high\-quality initial guesses which can then be refined by numerical solvers to converge to nearby optima\.

Different from prior work\[[42](https://arxiv.org/html/2606.00366#bib.bib42),[8](https://arxiv.org/html/2606.00366#bib.bib8),[7](https://arxiv.org/html/2606.00366#bib.bib7),[40](https://arxiv.org/html/2606.00366#bib.bib40)\], which focuses on predicting the globally optimal solution or improving the quality of a single solution over existing solvers, generative model\-based approaches\[[44](https://arxiv.org/html/2606.00366#bib.bib44),[13](https://arxiv.org/html/2606.00366#bib.bib13),[14](https://arxiv.org/html/2606.00366#bib.bib14)\]aim to learn distributions over multiple locally optimal solutions and how they vary with problem parameters\. This perspective is particularly appealing in applications where a diverse set of high\-quality solutions is desired for downstream decision\-making\.

Despite their promise, generative model\-based approaches to non\-convex optimization face a significant challenge: data scarcity\. Constructing training datasets typically requires expensive global search using numerical solvers such as SNOPT\[[19](https://arxiv.org/html/2606.00366#bib.bib19)\]or iLQR\[[46](https://arxiv.org/html/2606.00366#bib.bib46)\], run repeatedly across many problem instances\[[13](https://arxiv.org/html/2606.00366#bib.bib13),[44](https://arxiv.org/html/2606.00366#bib.bib44)\]\. Moreover, while solver\-generated data is abundant during the optimization process, most existing methods retain only the final converged solutions and discard intermediate solver iterates\. Although recent work has begun to connect the optimization process with diffusion training dynamics\[[47](https://arxiv.org/html/2606.00366#bib.bib47)\], learning the local neighborhood structure around optima and explicitly exploiting solver behaviors remain largely unexplored\.

Diffusion models\.Diffusion models are a class of generative models inspired by non\-equilibrium thermodynamics, in which random noise is gradually added to training data through a forward diffusion process, and a reverse denoising process is learned to recover data from noise\[[48](https://arxiv.org/html/2606.00366#bib.bib48),[10](https://arxiv.org/html/2606.00366#bib.bib10),[11](https://arxiv.org/html/2606.00366#bib.bib11)\]\. With their strong capability in modeling complex, high\-dimensional distributions, diffusion models have been successfully applied to a wide range of domains, including image and video generation\[[49](https://arxiv.org/html/2606.00366#bib.bib49),[50](https://arxiv.org/html/2606.00366#bib.bib50),[51](https://arxiv.org/html/2606.00366#bib.bib51)\], protein structure generation\[[52](https://arxiv.org/html/2606.00366#bib.bib52),[53](https://arxiv.org/html/2606.00366#bib.bib53),[54](https://arxiv.org/html/2606.00366#bib.bib54)\], and behavior modeling for robotics and autonomous vehicles\[[55](https://arxiv.org/html/2606.00366#bib.bib55),[56](https://arxiv.org/html/2606.00366#bib.bib56),[57](https://arxiv.org/html/2606.00366#bib.bib57),[58](https://arxiv.org/html/2606.00366#bib.bib58),[59](https://arxiv.org/html/2606.00366#bib.bib59),[60](https://arxiv.org/html/2606.00366#bib.bib60)\]\. To enable conditional generation, classifier\-free guidance is commonly used in diffusion models by jointly training conditional and unconditional models and interpolating their predictions during sampling\[[61](https://arxiv.org/html/2606.00366#bib.bib61)\]\. In contrast, classifier guidance trains a separate classifier or energy model on noisy samples and injects its gradient into the diffusion sampling process to steer generation toward desired conditions\[[62](https://arxiv.org/html/2606.00366#bib.bib62)\]\. Regarding model architectures, U\-Nets\[[63](https://arxiv.org/html/2606.00366#bib.bib63)\]are commonly used for modeling sequential data\[[55](https://arxiv.org/html/2606.00366#bib.bib55)\], and transformers\[[64](https://arxiv.org/html/2606.00366#bib.bib64)\]are popular in image and video domains\[[65](https://arxiv.org/html/2606.00366#bib.bib65)\]\.

### 1\.2Contributions

Our major contributions are as follows:

- •Data augmentation from solver iterates\.During each solver run, instead of collecting only the final converged solutions, we retain the last few solver iterates before convergence\. This provides free data augmentation that enriches the dataset with local geometry information around optima, without additional computational cost\.
- •GLENS method\.We propose GLENS, a diffusion model\-based global search method that exploits the augmented dataset through two complementary components: aNeighborhood Structure Modelthat learns the distribution of local geometry around optima conditioned on problem parameters and relative distances, and aSolver Behavior Modelthat captures solver refinement directions to further guide diffusion samples toward nearby optima\.
- •Experiments and analysis\.Through numerical experiments of increasing complexity, including a soft\-constrained QP, several non\-convex benchmark functions, and a two\-robot navigation problem, we show that GLENS generates high\-quality initial guesses while preserving solution diversity, leading to improved solver convergence across different problem settings and solvers\. We also analyze key hyperparameter choices from both data and modeling perspectives, which provides practical guidance when applying to new problem families\.

## 2Background

### 2\.1Parametric optimization

We consider a family of parametric continuous optimization problemsΩ\\Omega, where each problem instance𝒫α∈Ω\\mathcal\{P\}\_\{\\alpha\}\\in\\Omegais specified by a parameterα∈ℝdα\\alpha\\in\\mathbb\{R\}^\{d\_\{\\alpha\}\}and takes the form

minimizex∈ℝdxf​\(x;α\)subject tog​\(x;α\)≤0,h​\(x;α\)=0\.\\begin\{array\}\[\]\{ll\}\\underset\{x\\in\\mathbb\{R\}^\{d\_\{x\}\}\}\{\\text\{minimize\}\}&f\(x;\\alpha\)\\\\ \\text\{subject to\}&g\(x;\\alpha\)\\leq 0,\\\\ &h\(x;\\alpha\)=0\.\\end\{array\}\(𝒫α\\mathcal\{P\}\_\{\\alpha\}\)Here,x∈ℝdxx\\in\\mathbb\{R\}^\{d\_\{x\}\}is the continuous decision variable andα∈ℝdα\\alpha\\in\\mathbb\{R\}^\{d\_\{\\alpha\}\}is the problem parameter\. The objectivef:ℝdx×ℝdα→ℝf\\colon\\mathbb\{R\}^\{d\_\{x\}\}\\times\\mathbb\{R\}^\{d\_\{\\alpha\}\}\\to\\mathbb\{R\}, the inequality constraint functiong:ℝdx×ℝdα→ℝmgg\\colon\\mathbb\{R\}^\{d\_\{x\}\}\\times\\mathbb\{R\}^\{d\_\{\\alpha\}\}\\to\\mathbb\{R\}^\{m\_\{g\}\}, and the equality constraint functionh:ℝdx×ℝdα→ℝmhh\\colon\\mathbb\{R\}^\{d\_\{x\}\}\\times\\mathbb\{R\}^\{d\_\{\\alpha\}\}\\to\\mathbb\{R\}^\{m\_\{h\}\}have fixed functional forms across all instances, with the inequalityg​\(x;α\)≤0g\(x;\\alpha\)\\leq 0understood componentwise; the parameterα\\alphaspecifies the problem data of each instance\. We assume that, for everyα∈ℝdα\\alpha\\in\\mathbb\{R\}^\{d\_\{\\alpha\}\}, the functionsff,gg, andhhare smooth inxx, but allow them to be non\-convex\.

The first\-order Karush\-Kuhn\-Tucker \(KKT\) optimality conditions\[[66](https://arxiv.org/html/2606.00366#bib.bib66)\]for problem \([𝒫α\\mathcal\{P\}\_\{\\alpha\}](https://arxiv.org/html/2606.00366#S2.Ex1)\) can be written directly in residual form, with Lagrange multipliersλ∈ℝmg\\lambda\\in\\mathbb\{R\}^\{m\_\{g\}\}andν∈ℝmh\\nu\\in\\mathbb\{R\}^\{m\_\{h\}\}for the inequality and equality constraints\. For a toleranceεKKT\>0\\varepsilon\_\{\\text\{KKT\}\}\>0, we say thatxxsatisfies the KKT conditions if there exist multipliersλ\\lambdaandν\\nusuch that

‖∇xf​\(x;α\)\+∇xg​\(x;α\)⊤​λ\+∇xh​\(x;α\)⊤​ν‖∞\\displaystyle\\\|\\nabla\_\{x\}f\(x;\\alpha\)\+\\nabla\_\{x\}g\(x;\\alpha\)^\{\\top\}\\lambda\+\\nabla\_\{x\}h\(x;\\alpha\)^\{\\top\}\\nu\\\|\_\{\\infty\}≤εKKT,\\displaystyle\\leq\\varepsilon\_\{\\text\{KKT\}\},\(1\)‖h​\(x;α\)‖∞\\displaystyle\\\|h\(x;\\alpha\)\\\|\_\{\\infty\}≤εKKT,\\displaystyle\\leq\\varepsilon\_\{\\text\{KKT\}\},‖max⁡\{g​\(x;α\),0\}‖∞\\displaystyle\\\|\\max\\\{g\(x;\\alpha\),0\\\}\\\|\_\{\\infty\}≤εKKT,\\displaystyle\\leq\\varepsilon\_\{\\text\{KKT\}\},‖max⁡\{−λ,0\}‖∞\\displaystyle\\\|\\max\\\{\-\\lambda,0\\\}\\\|\_\{\\infty\}≤εKKT,\\displaystyle\\leq\\varepsilon\_\{\\text\{KKT\}\},\|λ⊤​g​\(x;α\)\|\\displaystyle\|\\lambda^\{\\top\}g\(x;\\alpha\)\|≤εKKT,\\displaystyle\\leq\\varepsilon\_\{\\text\{KKT\}\},where∇xg​\(x;α\)\\nabla\_\{x\}g\(x;\\alpha\)and∇xh​\(x;α\)\\nabla\_\{x\}h\(x;\\alpha\)are the Jacobians ofggandhhwith respect toxx, andmax⁡\{⋅,0\}\\max\\\{\\cdot,\\,0\\\}is taken componentwise\. The five inequalities correspond, respectively, to stationarity, primal equality feasibility, primal inequality feasibility, dual feasibility, and complementarity\.

To solve a non\-convex problem instance𝒫α\\mathcal\{P\}\_\{\\alpha\}, classical global search methods typically rely on two components: \(i\) an initial guess generatorΓ\\Gammaand \(ii\) a gradient\-based numerical solverπ\\pi\[[3](https://arxiv.org/html/2606.00366#bib.bib3)\]\. The generatorΓ\\Gammadefines a global sampling distribution over the decision spaceℝdx\\mathbb\{R\}^\{d\_\{x\}\}, possibly conditioned on the problem parameterα\\alpha, from which an initial guessℝdx∋x0∼Γ\\mathbb\{R\}^\{d\_\{x\}\}\\ni x^\{0\}\\sim\\Gammais drawn\. We model the numerical solver as a single\-step update operatorπ:ℝdx×ℝdα→ℝdx\\pi:\\mathbb\{R\}^\{d\_\{x\}\}\\times\\mathbb\{R\}^\{d\_\{\\alpha\}\}\\to\\mathbb\{R\}^\{d\_\{x\}\}that, starting fromx0x^\{0\}, maps each iterate to the next:

xm\+1=π​\(xm;α\),m=0,1,…,\\displaystyle x^\{m\+1\}=\\pi\(x^\{m\};\\alpha\),\\quad m=0,1,\\ldots,\(2\)until a termination criterion is satisfied\. If the solver converges afternniterations and the final iteratexnx^\{n\}satisfies the KKT conditions \([1](https://arxiv.org/html/2606.00366#S2.E1)\) to toleranceεKKT\\varepsilon\_\{\\text\{KKT\}\}, we callxnx^\{n\}a*locally optimal solution*and denote itx∗​\(x0;π,α\)x^\{\*\}\(x^\{0\};\\pi,\\alpha\)\. This notation emphasizes the dependence on the initial guessx0x^\{0\}, the solverπ\\pi, and the problem parameterα\\alpha\.

Since different initial guessesx0x^\{0\}may converge to different local optima, obtaining a diverse collection of locally optimal solutions typically requires sampling many initial guesses fromΓ\\Gammaand running the local search withπ\\pifrom each initial guess\. This global search process can be computationally expensive for high\-dimensional and highly nonlinear problems\.

##### Notation\.

Throughout the paper, superscripts onxxdenote solver iteration indices, not exponents:x0x^\{0\}is the initial guess,xmx^\{m\}is the iterate aftermmsolver steps, andx∗x^\{\*\}is the locally optimal solution should the initial guess converge\. For brevity, we will often suppress the dependence ofx∗x^\{\*\}on the initial conditionx0x^\{0\}, solverπ\\pi, and parameterα\\alpha\.

### 2\.2Amortized global search

AmorGS\[[43](https://arxiv.org/html/2606.00366#bib.bib43),[15](https://arxiv.org/html/2606.00366#bib.bib15)\]is a data\-driven approach that aims to reduce the computational cost of sampling diverse and high\-quality initial guesses for parametric optimization problems\. Here, high\-quality initial guesses refer to samples that are close to optima and therefore require fewer solver iterations to converge\. Rather than treating each problem instance𝒫α∈Ω\\mathcal\{P\}\_\{\\alpha\}\\in\\Omegaindependently, AmorGS uses data from a family of solved instances inΩ\\Omegato learn how locally optimal solutions vary with respect to the problem parameterα\\alpha\.

For offline data collection, AmorGS aims to construct a local optimum dataset𝒟∗\\mathcal\{D\}^\{\*\}, which consists of pairs of parameters and locally optimal solutions as follows\.

###### Definition 1\(Local optimum dataset\)\.

Let\{𝒫αi\}i=1N\\\{\\mathcal\{P\}\_\{\\alpha\_\{i\}\}\\\}\_\{i=1\}^\{N\}beNNproblem instances sampled from the parametric problem familyΩ\\Omegain problem \([𝒫α\\mathcal\{P\}\_\{\\alpha\}](https://arxiv.org/html/2606.00366#S2.Ex1)\)\. For each parameterαi\\alpha\_\{i\}, drawMMinitial guesses\{xi,j0\}j=1M\\\{x^\{0\}\_\{i,j\}\\\}\_\{j=1\}^\{M\}from an initial guess generatorΓ\\Gammaand run the solver iteration in Eq\. \([2](https://arxiv.org/html/2606.00366#S2.E2)\) from each one until numerical termination \(convergence or otherwise\)\. Let𝒪i⊆\{1,…,M\}\\mathcal\{O\}\_\{i\}\\subseteq\\\{1,\\ldots,M\\\}be the set of indices for which the solver converges, and write

xi,j∗≔x∗​\(xi,j0;π,αi\)\\displaystyle x^\{\*\}\_\{i,j\}\\coloneqq x^\{\*\}\(x^\{0\}\_\{i,j\};\\pi,\\alpha\_\{i\}\)\(3\)for the resulting locally optimal solution\. The*local optimum dataset*is the collection of parameter–solution pairs

𝒟∗≔\{\(αi,xi,j∗\)∣i=1,…,N,∀j∈𝒪i\}\.\\displaystyle\\mathcal\{D\}^\{\*\}\\coloneqq\\bigl\\\{\(\\alpha\_\{i\},\\,x^\{\*\}\_\{i,j\}\)\\mid i=1,\\ldots,N,\\;\\forall j\\in\\mathcal\{O\}\_\{i\}\\bigr\\\}\.\(4\)

A generative model is trained on𝒟∗\\mathcal\{D\}^\{\*\}to sample from the conditional distribution of locally optimal solutionsxi,j∗x^\{\*\}\_\{i,j\}given the corresponding problem parameterαi\\alpha\_\{i\}\. During testing, the trained generative model serves as a learned initial guess generatorΓ^α\\hat\{\\Gamma\}\_\{\\alpha\}conditioned on a new parameter valueα\\alpha, producing high\-quality and diverse initial guesses\. Because these generated initial guesses are close to locally optimal solutions, the numerical solverπ\\pican converge quickly, significantly reducing the cost of global search for previously unseen problem instances\. However, when the solving process is computationally expensive, this standard AmorGS may face the training data scarcity issue and also neglect valuable neighborhood information around the optima\.

Prior work has shown that diffusion models are particularly effective in the AmorGS framework, outperforming alternative generative approaches such as VAEs\[[13](https://arxiv.org/html/2606.00366#bib.bib13)\]\. We detail the main features of diffusion models in the next section before introducing their use in the modeling contributions of this paper in Section[3](https://arxiv.org/html/2606.00366#S3)\.

### 2\.3Diffusion models

Diffusion models consist of a forward process to generate noisy data for training and a learnable reverse process for data sampling at test time\. Within the AmorGS framework\[[13](https://arxiv.org/html/2606.00366#bib.bib13)\], denoising diffusion probabilistic models \(DDPM\)\[[11](https://arxiv.org/html/2606.00366#bib.bib11)\]are used as the data\-driven initial guess generatorΓ^α\\hat\{\\Gamma\}\_\{\\alpha\}conditioned on parameterα\\alpha\.

Training\.Given a data samplez∼qdataz\\sim q\_\{\\text\{data\}\}drawn from the data distribution, e\.g\., the distribution of locally optimal solutions, the forward diffusion process gradually adds Gaussian noise to the data\. Letz0≔zz\_\{0\}\\coloneqq z; the forward process generates a sequence of noisy samplesz1,z2,…,zTz\_\{1\},z\_\{2\},\\ldots,z\_\{T\}withq​\(zt\|zt−1\)=𝒩​\(zt;1−βt​zt−1,βt​I\)q\(z\_\{t\}\|z\_\{t\-1\}\)=\\mathcal\{N\}\(z\_\{t\};\\sqrt\{1\-\\beta\_\{t\}\}z\_\{t\-1\},\\beta\_\{t\}I\), where0<β1<β2<⋯<βT=10<\\beta\_\{1\}<\\beta\_\{2\}<\\cdots<\\beta\_\{T\}=1is an increasing noise schedule\. With the reparameterization trick\[[11](https://arxiv.org/html/2606.00366#bib.bib11)\]and lettingσt≔1−βt\\sigma\_\{t\}\\coloneqq 1\-\\beta\_\{t\}andσ¯t≔∏i=1tσi\\bar\{\\sigma\}\_\{t\}\\coloneqq\\prod\_\{i=1\}^\{t\}\\sigma\_\{i\}, the noisy sampleztz\_\{t\}has the closed\-form expression

zt=σ¯t​z0\+1−σ¯t​ϵ,\\displaystyle z\_\{t\}=\\sqrt\{\\bar\{\\sigma\}\_\{t\}\}\\,z\_\{0\}\+\\sqrt\{1\-\\bar\{\\sigma\}\_\{t\}\}\\,\\epsilon,\(5\)whereϵ∼𝒩​\(0,I\)\\epsilon\\sim\\mathcal\{N\}\(0,I\)\.

In the reverse process, DDPM starts from a samplezT∼𝒩​\(0,I\)z\_\{T\}\\sim\\mathcal\{N\}\(0,I\)and learns a modelpθp\_\{\\theta\}, parameterized byθ\\theta, that iteratively denoises the data aspθ​\(zt−1∣zt\)=𝒩​\(zt−1;μθ​\(zt,t\),βt​I\)p\_\{\\theta\}\(z\_\{t\-1\}\\mid z\_\{t\}\)=\\mathcal\{N\}\(z\_\{t\-1\};\\mu\_\{\\theta\}\(z\_\{t\},t\),\\beta\_\{t\}I\)\. Instead of predictingμθ​\(zt,t\)\\mu\_\{\\theta\}\(z\_\{t\},t\)directly, DDPM predicts the noiseϵθ​\(zt,t\)\\epsilon\_\{\\theta\}\(z\_\{t\},t\)added toztz\_\{t\}, with the transformation

μθ​\(zt,t\)=1σt​\(zt−1−σt1−σ¯t​ϵθ​\(zt,t\)\)\.\\displaystyle\\mu\_\{\\theta\}\(z\_\{t\},t\)=\\frac\{1\}\{\\sqrt\{\\sigma\_\{t\}\}\}\\left\(z\_\{t\}\-\\frac\{1\-\\sigma\_\{t\}\}\{\\sqrt\{1\-\\bar\{\\sigma\}\_\{t\}\}\}\\,\\epsilon\_\{\\theta\}\(z\_\{t\},t\)\\right\)\.\(6\)Sinceztz\_\{t\}can be expressed in closed form using Eq\. \([5](https://arxiv.org/html/2606.00366#S2.E5)\), the DDPM training loss is

ℒDDPM=𝔼z0,ϵ,t​‖ϵ−ϵθ​\(σ¯t​z0\+1−σ¯t​ϵ,t\)‖2,\\displaystyle\\mathcal\{L\}\_\{\\text\{DDPM\}\}=\\mathbb\{E\}\_\{z\_\{0\},\\epsilon,t\}\\left\\\|\\epsilon\-\\epsilon\_\{\\theta\}\\\!\\left\(\\sqrt\{\\bar\{\\sigma\}\_\{t\}\}\\,z\_\{0\}\+\\sqrt\{1\-\\bar\{\\sigma\}\_\{t\}\}\\,\\epsilon,\\,t\\right\)\\right\\\|^\{2\},\(7\)wherez0∼qdataz\_\{0\}\\sim q\_\{\\text\{data\}\},ϵ∼𝒩​\(0,I\)\\epsilon\\sim\\mathcal\{N\}\(0,I\), andttis uniformly drawn from\[1,T\]\[1,T\]\.

Sampling\.Sampling begins by drawingzT∼𝒩​\(0,I\)z\_\{T\}\\sim\\mathcal\{N\}\(0,I\), followed by iterative denoising steps

zt−1∼𝒩​\(μθ​\(zt,t\),βt​I\),t=T−1,T−2,…,1,\\displaystyle z\_\{t\-1\}\\sim\\mathcal\{N\}\(\\mu\_\{\\theta\}\(z\_\{t\},t\),\\beta\_\{t\}I\),\\quad t=T\-1,T\-2,\\dots,1,\(8\)whereμθ​\(zt,t\)\\mu\_\{\\theta\}\(z\_\{t\},t\)is derived from the learned noise predictorϵθ​\(zt,t\)\\epsilon\_\{\\theta\}\(z\_\{t\},t\)using Eq\. \([6](https://arxiv.org/html/2606.00366#S2.E6)\)\. This procedure is repeated until the clean samplez0z\_\{0\}is obtained\.

Conditional generation\.Diffusion models support conditional data generation given auxiliary informationyyassociated with the datazz\. Classifier\-free guidance\[[61](https://arxiv.org/html/2606.00366#bib.bib61)\]jointly learns a conditional noise predictorϵθ​\(zt,t,y\)\\epsilon\_\{\\theta\}\(z\_\{t\},t,y\)and an unconditional predictorϵθ​\(zt,t,y=∅\)\\epsilon\_\{\\theta\}\(z\_\{t\},t,y=\\varnothing\)using the same loss function in Eq\. \([7](https://arxiv.org/html/2606.00366#S2.E7)\) by randomly masking the conditionyyduring training with a certain probability\. During sampling, to generate datazzwith auxiliaryyy, an interpolated noise predictorϵ¯\\bar\{\\epsilon\}is used

ϵ¯θ​\(zt,t,y\)=\(s\+1\)​ϵθ​\(zt,t,y\)−s​ϵθ​\(zt,t,y=∅\),\\displaystyle\\bar\{\\epsilon\}\_\{\\theta\}\(z\_\{t\},t,y\)=\(s\+1\)\\epsilon\_\{\\theta\}\(z\_\{t\},t,y\)\-s\\epsilon\_\{\\theta\}\(z\_\{t\},t,y=\\varnothing\),\(9\)wheresscontrols the weight of guidance and balances between sample diversity and fidelity\.

Classifier guidance\[[62](https://arxiv.org/html/2606.00366#bib.bib62)\]trains an auxiliary classifier on noisy samplesztz\_\{t\}and uses the gradient of its log\-likelihood to steer the reverse diffusion process\. More generally, this guidance can be written in the form of an energy functionEϕ​\(zt,α\)E\_\{\\phi\}\(z\_\{t\},\\alpha\)\. During sampling, the gradient of the energy∇ztEϕ​\(zt,α\)\\nabla\_\{z\_\{t\}\}E\_\{\\phi\}\(z\_\{t\},\\alpha\)is injected into each denoising step to steer the diffusion process toward lower energy:

zt−1∼𝒩​\(μθ−s~​βt​∇ztEϕ​\(zt,α\),βt​I\),\\displaystyle z\_\{t\-1\}\\sim\\mathcal\{N\}\\bigl\(\\mu\_\{\\theta\}\-\\tilde\{s\}\\,\\beta\_\{t\}\\nabla\_\{z\_\{t\}\}E\_\{\\phi\}\(z\_\{t\},\\alpha\),\\,\\beta\_\{t\}I\\bigr\),\(10\)wheres~\\tilde\{s\}controls the strength of the guidance\.

Neural network backbone\.Within the AmorGS framework, the conditional noise predictorϵθ​\(zt,t,y\)\\epsilon\_\{\\theta\}\(z\_\{t\},t,y\)is implemented using a U\-Net architecture\[[63](https://arxiv.org/html/2606.00366#bib.bib63)\], following the implementation in\[[67](https://arxiv.org/html/2606.00366#bib.bib67)\]\. We use the same U\-Net architecture as the backbone for the learned gradient field in classifier guidance in this manuscript\. This architecture is introduced in Section[3\.2\.2](https://arxiv.org/html/2606.00366#S3.SS2.SSS2)\.

## 3Method

We first introduce the concept of akk\-neighborhood path, inspired by Beeson et al\.\[[15](https://arxiv.org/html/2606.00366#bib.bib15)\], which consists of solver iterates in the neighborhood of a converged local optimum within the same solver run, and define the correspondingkk\-neighborhood dataset\. To exploit thekk\-neighborhood dataset, we then present GLENS, a novel data\-driven global search based on diffusion models\. Through aNeighborhood Structure Modeland aSolver Behavior Model, GLENS enables data\-efficient generation of high\-quality initial guesses for non\-convex continuous optimization problems\.

### 3\.1kk\-neighborhood dataset

Existing amortized optimization methods typically train on the local optimum dataset𝒟∗\\mathcal\{D\}^\{\*\}as defined in Definition[1](https://arxiv.org/html/2606.00366#Thmdefinition1), which contains only the final converged iterates produced by numerical solvers\. However, our key insight is that the last few solver iterates leading to convergence encode rich information about the neighborhood structure of a locally optimal solution, which is highly informative for data\-driven global search\.

Motivated by this, we introduce the concept of akk\-neighborhood path, which denotes the finalkksolver iterates before convergence\. An illustration of akk\-neighborhood path is shown in Figure[2](https://arxiv.org/html/2606.00366#S3.F2)\.

###### Definition 2\(kk\-neighborhood path\)\.

Let𝒫α\\mathcal\{P\}\_\{\\alpha\}be a problem instance,x0x^\{0\}an initial guess, and\(x0,x1,…,xn\)\(x^\{0\},x^\{1\},\\ldots,x^\{n\}\)the sequence of solver iterates converging to a locally optimal solutionxn=x∗x^\{n\}=x^\{\*\}\. Fork∈ℤ\>0k\\in\\mathbb\{Z\}\_\{\>0\}withk≤n\+1k\\leq n\+1, the*kk\-neighborhood path*is defined as the lastkkiterates of this solver run:

𝒩k≔\{xn−k\+1,xn−k\+2,…,xn\}\.\\displaystyle\\mathcal\{N\}^\{k\}\\coloneqq\\bigl\\\{x^\{n\-k\+1\},x^\{n\-k\+2\},\\ldots,x^\{n\}\\bigr\\\}\.\(11\)We use additional subscripts on𝒩k\\mathcal\{N\}^\{k\}to indicate a path associated with theii\-th problem instance𝒫αi\\mathcal\{P\}\_\{\\alpha\_\{i\}\}and a solver initialized with thejj\-th initial guessxi,j0x^\{0\}\_\{i,j\}; in particular, this would yield the definition:

𝒩i,jk≔\{xi,jn−k\+1,xi,jn−k\+2,…,xi,jn\}\.\\displaystyle\\mathcal\{N\}^\{k\}\_\{i,j\}\\coloneqq\\bigl\\\{x^\{n\-k\+1\}\_\{i,j\},x^\{n\-k\+2\}\_\{i,j\},\\ldots,x^\{n\}\_\{i,j\}\\bigr\\\}\.\(12\)

The11\-neighborhood path𝒩1=\{x∗\}\\mathcal\{N\}^\{1\}=\\\{x^\{\*\}\\\}reduces to the locally optimal solution itself\. In Figure[2](https://arxiv.org/html/2606.00366#S3.F2), we show an example of a local optimumx∗x^\{\*\}and its corresponding66\-neighborhood path𝒩6\\mathcal\{N\}^\{6\}\.

![Refer to caption](https://arxiv.org/html/2606.00366v1/x2.png)Figure 2:Illustration of a local optimumx∗x^\{\*\}and the corresponding66\-neighborhood path𝒩6=\{xn−5,xn−4,xn−3,xn−2,xn−1,xn\}\\mathcal\{N\}^\{6\}=\\\{x^\{n\-5\},x^\{n\-4\},x^\{n\-3\},x^\{n\-2\},x^\{n\-1\},x^\{n\}\\\}within the same solver run\.Building on thekk\-neighborhood path concept, we introduce the followingkk\-neighborhood dataset𝒟k\\mathcal\{D\}^\{k\}\.

###### Definition 3\(kk\-neighborhood dataset\)\.

Under the setup of Definition[1](https://arxiv.org/html/2606.00366#Thmdefinition1), let𝒩i,jk\\mathcal\{N\}^\{k\}\_\{i,j\}denote thekk\-neighborhood path of the solver run initialized atxi,j0x^\{0\}\_\{i,j\}, as defined in Definition[2](https://arxiv.org/html/2606.00366#Thmdefinition2)\. The*kk\-neighborhood dataset*is the collection of triples

𝒟k≔\{\(αi,x,xi,j∗\)∣i=1,…,N,∀j∈𝒪i,∀x∈𝒩i,jk\}\.\\displaystyle\\mathcal\{D\}^\{k\}\\coloneqq\\bigl\\\{\(\\alpha\_\{i\},\\,x,\\,x^\{\*\}\_\{i,j\}\)\\mid i=1,\\ldots,N,\\;\\forall j\\in\\mathcal\{O\}\_\{i\},\\;\\forall x\\in\\mathcal\{N\}^\{k\}\_\{i,j\}\\bigr\\\}\.\(13\)Each data point consists of a problem parameter, an iterate from the correspondingkk\-neighborhood path, and the locally optimal solution reached by that solver run\.

By construction, the11\-neighborhood dataset𝒟1\\mathcal\{D\}^\{1\}corresponds to the local optimum dataset𝒟∗\\mathcal\{D\}^\{\*\}in Definition[1](https://arxiv.org/html/2606.00366#Thmdefinition1)\. Furthermore, thekk\-neighborhood dataset𝒟k\\mathcal\{D\}^\{k\}contains up tokksolver iterates associated with each locally optimal solution, without requiring additional solver runs\. These additional iterates provide richer information about the local neighborhood structure surrounding each locally optimal solution\.

### 3\.2Two\-component learning architecture

Although thekk\-neighborhood dataset𝒟k\\mathcal\{D\}^\{k\}substantially enriches the available training data, the intermediate solver iterates are generally suboptimal, making them challenging for direct use in existing data\-driven optimization methods\. To better exploit thekk\-neighborhood dataset, we propose GLENS, a global search method using diffusion models that consists of two complementary learning components: aNeighborhood Structure Modelthat captures the neighborhood geometry and aSolver Behavior Modelthat learns the solver refinement behavior\. The two models are trained separately offline on𝒟k\\mathcal\{D\}^\{k\}and combined during online sampling to guide diffusion\-based initial guess generation\.

#### 3\.2\.1Neighborhood structure model

TheNeighborhood Structure Modelis a conditional diffusion model that serves as a data\-driven initial guess generatorΓ^α\\hat\{\\Gamma\}\_\{\\alpha\}\. Different from the generator in Section[2\.2](https://arxiv.org/html/2606.00366#S2.SS2), it is trained on thekk\-neighborhood dataset𝒟k\\mathcal\{D\}^\{k\}from Definition[3](https://arxiv.org/html/2606.00366#Thmdefinition3), whose entries are triples\(α,x,x∗\)\(\\alpha,x,x^\{\*\}\)withxxbeing a solver iterate in thekk\-neighborhood path of an optimumx∗x^\{\*\}\. This enlarges the support of the empirical training data distribution beyond locally optimal solutions\. However, when conditioned only on the problem parameterα\\alpha, all iterates within a neighborhood are treated equivalently, preventing the model from capturing their relative geometric structure\.

To encode this relative structure, we augment the conditioning information with an additional scalar variablerr, the distance between a solver iterate and the locally optimal solution reached by the same solver run\. For each data point\(α,x,x∗\)\(\\alpha,x,x^\{\*\}\)in thekk\-neighborhood dataset𝒟k\\mathcal\{D\}^\{k\}, we introduce

r​\(x,x∗\)≔‖x−x∗‖2\.\\displaystyle r\(x,x^\{\*\}\)\\coloneqq\\bigl\\\|x\-x^\{\*\}\\bigr\\\|\_\{2\}\.\(14\)The diffusion\-basedNeighborhood Structure Modelis then trained to sample solver iterates conditioned on the augmented variabley≔\(α,r\)y\\;\\coloneqq\\;\(\\alpha,r\), whereα\\alphacaptures problem\-level information andrrencodes the relative position of an iterate within the local neighborhood\.

At test time, when a new problem instance𝒫α\\mathcal\{P\}\_\{\\alpha\}is given, its locally optimal solutions are unknown\. Querying theNeighborhood Structure Modelwithr=0r=0biases the diffusion sampling process toward samples concentrated near the manifold of locally optimal solutions\. In this way, theNeighborhood Structure Modelnaturally serves as a learned initial guess generatorΓ^α\\hat\{\\Gamma\}\_\{\\alpha\}conditioned on parameterα\\alphafor data\-driven global search\.

Training and sampling follow the standard DDPM framework reviewed in Section[2\.3](https://arxiv.org/html/2606.00366#S2.SS3), with classifier\-free guidance for the conditional variableyy\. At training time, an unconditional noise predictorϵθ​\(zt,t,y=∅\)\\epsilon\_\{\\theta\}\(z\_\{t\},t,y=\\varnothing\)and a conditional noise predictorϵθ​\(zt,t,y\)\\epsilon\_\{\\theta\}\(z\_\{t\},t,y\), parameterized by a neural network with weightsθ\\theta, are jointly trained on thekk\-neighborhood dataset𝒟k\\mathcal\{D\}^\{k\}using the DDPM objective in Eq\. \([7](https://arxiv.org/html/2606.00366#S2.E7)\); the conditioning variableyyis randomly dropped with probabilitypuncondp\_\{\\text\{uncond\}\}\. At sampling time, the guided noise predictorϵ¯θ​\(zt,t,y\)\\bar\{\\epsilon\}\_\{\\theta\}\(z\_\{t\},t,y\)in Eq\. \([9](https://arxiv.org/html/2606.00366#S2.E9)\) with weightsNSs\_\{\\text\{NS\}\}is plugged into Eqs\. \([6](https://arxiv.org/html/2606.00366#S2.E6)\) and \([8](https://arxiv.org/html/2606.00366#S2.E8)\) to iteratively denoise from Gaussian noise\.

#### 3\.2\.2Solver behavior model

While theNeighborhood Structure Modelcaptures the geometry of solution neighborhoods from a global perspective, it does not explicitly model how the numerical solver locally refines iterates toward nearby optima\. The generated samples may therefore still deviate from exact optima due to generalization error, even when queried withr=0r=0\.

To address this limitation, we introduce theSolver Behavior Model, which learns the local refinement behavior of the numerical solver fromkk\-neighborhood paths and is used to further guide diffusion samples toward their associated optima\. For each data point\(α,x,x∗\)\(\\alpha,x,x^\{\*\}\)in thekk\-neighborhood dataset𝒟k\\mathcal\{D\}^\{k\}, we define the energy function

E​\(x,x∗\)≔12​r​\(x,x∗\)2=12​‖x−x∗‖22\.\\displaystyle E\(x,x^\{\*\}\)\\coloneqq\\frac\{1\}\{2\}r\(x,x^\{\*\}\)^\{2\}=\\frac\{1\}\{2\}\\bigl\\\|x\-x^\{\*\}\\bigr\\\|\_\{2\}^\{2\}\.\(15\)Its gradient with respect toxxis

∇xE​\(x,x∗\)=x−x∗\.\\displaystyle\\nabla\_\{x\}E\(x,x^\{\*\}\)=x\-x^\{\*\}\.\(16\)Thus, the negative gradient−∇xE​\(x,x∗\)\-\\nabla\_\{x\}E\(x,x^\{\*\}\)points from the iteratexxtoward the associated locally optimal solution\. The gradient of this energy function characterizes the solver refinement behavior of each iterate within thekk\-neighborhood path\.

TheSolver Behavior Modelis parameterized as a neural networkξϕ:ℝdx×ℝdα→ℝdx\\xi\_\{\\phi\}:\\mathbb\{R\}^\{d\_\{x\}\}\\times\\mathbb\{R\}^\{d\_\{\\alpha\}\}\\to\\mathbb\{R\}^\{d\_\{x\}\}and is trained to predict this gradient field\. During training, for each entry\(α,x,x∗\)∈𝒟k\(\\alpha,x,x^\{\*\}\)\\in\\mathcal\{D\}^\{k\}, the optimumx∗x^\{\*\}is used only to construct the supervised target\. Specifically, the networkξϕ\\xi\_\{\\phi\}is learned by minimizing

ℒSB=𝔼\(α,x,x∗\)∼𝒟k​\[‖ξϕ​\(x,α\)−∇xE​\(x,x∗\)‖22\]\.\\displaystyle\\mathcal\{L\}\_\{\\text\{SB\}\}\\;=\\;\\mathbb\{E\}\_\{\(\\alpha,x,x^\{\*\}\)\\sim\\mathcal\{D\}^\{k\}\}\\Bigl\[\\bigl\\\|\\xi\_\{\\phi\}\(x,\\alpha\)\-\\nabla\_\{x\}E\(x,x^\{\*\}\)\\bigr\\\|\_\{2\}^\{2\}\\Bigr\]\.\(17\)At test time,ξϕ\\xi\_\{\\phi\}predicts the gradient field from\(x,α\)\(x,\\alpha\)alone\.

Standard classifier guidance trains an energy function on Gaussian\-corrupted samples and obtains the guidance direction by differentiating its output\. TheSolver Behavior Modelinstead directly learns a gradient field from structuredkk\-neighborhood path iterates around local optima\. These iterates can be viewed as solver\-generated perturbations around their corresponding local optima\. During online sampling, we set the radiusr=0r=0in theNeighborhood Structure Modelto generate samples near local optima, and apply theSolver Behavior Modelguidance only in the final denoising stepst≤tguidet\\leq t\_\{\\text\{guide\}\}\. At these late steps, the diffusion latent stateztz\_\{t\}is only lightly corrupted and is assumed to lie near the learned neighborhood around a local optimum\. Thus, evaluatingξϕ​\(zt,α\)\\xi\_\{\\phi\}\(z\_\{t\},\\alpha\)can be approximated by the learned gradient fieldξϕ​\(x,α\)\\xi\_\{\\phi\}\(x,\\alpha\)trained on thekk\-neighborhood dataset\. Following the classifier\-guidance form in Eq\. \([10](https://arxiv.org/html/2606.00366#S2.E10)\), this refinement gradient field is incorporated with strengths~=sSB\\tilde\{s\}=s\_\{\\text\{SB\}\}to guide the sample toward the corresponding optimum\.

Algorithm 1GLENS online sampling1:Neighborhood Structure Modelnoise predictor

ϵθ\\epsilon\_\{\\theta\}with weight

sNSs\_\{\\text\{NS\}\}
2:Solver Behavior Modelgradient field

ξϕ\\xi\_\{\\phi\}with weight

sSBs\_\{\\text\{SB\}\}
3:Diffusion steps

TT, guidance activation step

tguidet\_\{\\text\{guide\}\}
4:Noise schedule

\{βt\}t=1T\\\{\\beta\_\{t\}\\\}\_\{t=1\}^\{T\},

σt≔1−βt\\sigma\_\{t\}\\coloneqq 1\-\\beta\_\{t\},

σ¯t≔∏i=1tσi\\bar\{\\sigma\}\_\{t\}\\coloneqq\\prod\_\{i=1\}^\{t\}\\sigma\_\{i\}
5:Test parameter

α\\alpha
6:Initial guess prediction

x^\\hat\{x\}
7:Set

r←0r\\leftarrow 0and

y←\(α,r\)y\\leftarrow\(\\alpha,r\)\.

8:Sample

zT∼𝒩​\(0,I\)z\_\{T\}\\sim\\mathcal\{N\}\(0,I\)\.

9:for

t=T,T−1,…,1t=T,T\-1,\\ldots,1do

10:Compute classifier\-free guided noise

ϵ¯θ​\(zt,t,y\)←\(1\+sNS\)​ϵθ​\(zt,t,y\)−sNS​ϵθ​\(zt,t,∅\)\.\\displaystyle\\bar\{\\epsilon\}\_\{\\theta\}\(z\_\{t\},t,y\)\\leftarrow\(1\+s\_\{\\text\{NS\}\}\)\\epsilon\_\{\\theta\}\(z\_\{t\},t,y\)\-s\_\{\\text\{NS\}\}\\epsilon\_\{\\theta\}\(z\_\{t\},t,\\varnothing\)\.
11:Compute mean

μθ​\(zt,t,y\)\\mu\_\{\\theta\}\(z\_\{t\},t,y\)μθ​\(zt,t,y\)←1σt​\(zt−1−σt1−σ¯t​ϵ¯θ​\(zt,t,y\)\)\.\\displaystyle\\mu\_\{\\theta\}\(z\_\{t\},t,y\)\\leftarrow\\frac\{1\}\{\\sqrt\{\\sigma\_\{t\}\}\}\\left\(z\_\{t\}\-\\frac\{1\-\\sigma\_\{t\}\}\{\\sqrt\{1\-\\bar\{\\sigma\}\_\{t\}\}\}\\,\\bar\{\\epsilon\}\_\{\\theta\}\(z\_\{t\},t,y\)\\right\)\.
12:if

t≤tguidet\\leq t\_\{\\text\{guide\}\}then

13:Approximate the solver refinement direction by

−ξϕ​\(zt,α\)\-\\xi\_\{\\phi\}\(z\_\{t\},\\alpha\)\.

14:Sample with solver refinement

zt−1∼𝒩​\(μθ​\(zt,t,y\)−sSB​βt​ξϕ​\(zt,α\),βt​I\)\.\\displaystyle z\_\{t\-1\}\\sim\\mathcal\{N\}\\\!\\left\(\\mu\_\{\\theta\}\(z\_\{t\},t,y\)\-s\_\{\\text\{SB\}\}\\,\\beta\_\{t\}\\,\\xi\_\{\\phi\}\(z\_\{t\},\\alpha\),\\;\\beta\_\{t\}I\\right\)\.
15:else

16:Sample without solver refinement

zt−1∼𝒩​\(μθ​\(zt,t,y\),βt​I\)\.\\displaystyle z\_\{t\-1\}\\sim\\mathcal\{N\}\\\!\\left\(\\mu\_\{\\theta\}\(z\_\{t\},t,y\),\\;\\beta\_\{t\}I\\right\)\.
17:endif

18:endfor

19:return

x^←z0\\hat\{x\}\\leftarrow z\_\{0\}

#### 3\.2\.3Combined online sampling

While theNeighborhood Structure ModelandSolver Behavior Modelare trained separately offline, they are coupled during online testing to perform a guided diffusion sampling process\. Combining neighborhood structure modeling with solver refinement enables high\-quality initial guess generation for data\-driven global search in GLENS\. For clarity, we summarize the online sampling procedure of GLENS in Algorithm[1](https://arxiv.org/html/2606.00366#alg1)\.

## 4Numerical experiments

We evaluate GLENS through numerical experiments designed to investigate how learning fromkk\-neighborhood dataset with theNeighborhood Structure ModelandSolver Behavior Modelimproves data\-driven global search\. In particular, the experiments assess \(i\) whether GLENS enables more accurate and robust modeling of how locally optimal solutions vary with problem parameters, and \(ii\) whether this capability generalizes to non\-convex problems with multiple local optima\. Throughout this section, when a problem instance has a multi\-component parameter, we writeα=\(α1,α2,…\)∈ℝdα\\alpha=\(\\alpha\_\{1\},\\alpha\_\{2\},\\ldots\)\\in\\mathbb\{R\}^\{d\_\{\\alpha\}\}, with subscripts denoting components of a single parameter vector rather than indices over problem instances\.

Test problems\.We first consider a baseline soft\-constrained QP solved with gradient descent\. This unimodal example mimics the neighborhood structure around a single optimum\. It establishes how all compared methods learn the solution variation with respect to problem parameters\. We then extend our evaluation to multimodal non\-convex benchmarks, including the modified Himmelblau function, Rosenbrock function, and Levy function, solved with the L\-BFGS\-B solver, demonstrating how GLENS enables effective global search in the presence of multiple distinct optima\.

Model setup\.We use “NS” to denote theNeighborhood Structure Modeland “SB” to denote theSolver Behavior Model\. TheNeighborhood Structure Modeluses a diffusion model with5050sampling steps and a U\-Net backbone for the noise predictor\. The U\-Net uses a base feature dimension of6464and three downsampling and upsampling layers with channel multipliers of\{1,2,4\}\\\{1,2,4\\\}\. TheSolver Behavior Modelalso uses a U\-Net backbone to learn the gradient field, with similar three layers for downsampling and upsampling, with base feature dimension of6464and channel multipliers of\{1,2,4\}\\\{1,2,4\\\}\.

![Refer to caption](https://arxiv.org/html/2606.00366v1/x3.png)Figure 3:Ground truth solutions for the test problems in the first two dimensions, shown for a fixed parameter value\. Each grey circle represents a converged local optimum obtained by the corresponding solver\. For the QP example, objective contours are shown with the constraint indicated by the blue dashed line\. The non\-convex problems, including Himmelblau, Rosenbrock, and Levy, exhibit rich multimodal structures with multiple local optima\.During testing, we set the guidance weightsNS=0\.5s\_\{\\text\{NS\}\}=0\.5forNeighborhood Structure Modelin classifier\-free guidance, and apply the solver refinement direction fromSolver Behavior Modelwith weightsSB=100s\_\{\\text\{SB\}\}=100when diffusion stept≤tguide=5t\\leq t\_\{\\text\{guide\}\}=5\.

Comparison methods\.We compare the following methods that generate initial guesses for global search:

- •Uniform: Initial guesses are sampled uniformly from the solution space\.
- •DiffuSolve\[[13](https://arxiv.org/html/2606.00366#bib.bib13)\]: A state\-of\-the\-art diffusion\-based global search method trained only on11\-neighborhood data, i\.e\., the locally optimal solutions\. To ensure a fair comparison, it uses the same U\-Net architecture as ourNeighborhood Structure Model\.
- •DiffuSolve\-k: The same model used in DiffuSolve\[[13](https://arxiv.org/html/2606.00366#bib.bib13)\]but trained on thekk\-neighborhood dataset\.
- •NS\-only\(GLENS without solver guidance\): The proposedNeighborhood Structure Modeltrained on thekk\-neighborhood dataset, without theSolver Behavior Model\.
- •GLENS: The full proposed method that combines theNeighborhood Structure ModelandSolver Behavior Model, both trained on thekk\-neighborhood dataset\.

Evaluation metrics\.We evaluate the quality of sampled initial guesses for global search mainly from the following two aspects\.

- •Convergence behavior\.For an unseen problem instance, we measure whichkk\-neighborhood the generated initial guesses lie in, equivalently, the number of iterations required at most by the solver to converge\. Initial guesses that fall into smallerkk\-neighborhoods lead to faster solver convergence and indicate higher quality predictions conditioned on the problem parameters\.
- •Diversity and coverage\.For non\-convex problems with multiple local optima, we examine whether the generated samples cover multiple distinct ground truth optima rather than collapsing to a single mode\.

The sample visualizations and thekk\-neighborhood statistics provide complementary views of performance\. The figures are intended to qualitatively assess whether the generated samples capture the structure and diversity of the ground truth local optima\. The tables provide the primary quantitative comparison of sample quality by measuring how close the generated samples are to solver convergence\. Therefore, in some examples, different methods may capture similar qualitative solution structures while their differences in sample accuracy are more clearly reflected by thekk\-neighborhood statistics\.

### 4\.1Baseline problem: soft\-constrained QP

#### 4\.1\.1Problem setup

We consider the following soft\-constrained QP:

minx∈ℝdx12​x⊤​x\+α1​∑i=1dx/2ln⁡\(1\+exp⁡\(−x2​i−1−x2​i\)\)\.\\displaystyle\\min\_\{x\\in\\mathbb\{R\}^\{d\_\{x\}\}\}\\quad\\frac\{1\}\{2\}x^\{\\top\}x\+\\alpha\_\{1\}\\sum\_\{i=1\}^\{d\_\{x\}/2\}\\ln\\\!\\left\(1\+\\exp\\\!\\left\(\-x\_\{2i\-1\}\-x\_\{2i\}\\right\)\\right\)\.\(18\)The objective consists of a quadratic term and a soft constraint via the softplus function\. This soft constraint shifts the solution away from the unconstrained minimum at the origin, as shown in Figure[3](https://arxiv.org/html/2606.00366#S4.F3)\. The parameterα1∈ℝ\\alpha\_\{1\}\\in\\mathbb\{R\}controls the strength of the constraint\. Due to the symmetry of both the objective and the constraint, the optimal solutions lie along a diagonal direction and move along it asα1\\alpha\_\{1\}varies\.

#### 4\.1\.2Data collection

We consider three problem dimensionsdx=100,150,200d\_\{x\}=100,150,200\. For each case, we uniformly sample9090parameter valuesα1∈\[0,30\]\\alpha\_\{1\}\\in\[0,30\], with8080used for training and1010for validation\. For eachα1\\alpha\_\{1\}, we draw100100initial guessesx0x^\{0\}uniformly from\[−2,2\]dx\[\-2,2\]^\{d\_\{x\}\}\. We use vanilla gradient descent as the solverπ\\pi, with a fixed step size of0\.10\.1, and terminate when the residual satisfies‖xm−xm−1‖≤10−2\\\|x^\{m\}\-x^\{m\-1\}\\\|\\leq 10^\{\-2\}\. From each solver run, we collect the last1010iterates to form the1010\-neighborhood dataset\. In this illustrative example, all converged solutions are treated as optimal\.

#### 4\.1\.3Test results

![Refer to caption](https://arxiv.org/html/2606.00366v1/x4.png)Figure 4:First two dimensions of the ground truth optima \(grey circles\) and sampled initial guesses \(colored crosses\) for the soft\-constrained QP under three unseen test parametersα\(1\)=3\.83\\alpha^\{\(1\)\}=3\.83,α\(2\)=8\.21\\alpha^\{\(2\)\}=8\.21, andα\(3\)=28\.79\\alpha^\{\(3\)\}=28\.79, with each row corresponding to one parameter\. Each column corresponds to a different method\. The objective contours are shown in the background, and the linear constraintx1\+x2=0x\_\{1\}\+x\_\{2\}=0is indicated by the blue dashed line\. Asα1\\alpha\_\{1\}increases, the optimal solutions shift upward, and the contours become more distorted\. All diffusion\-based methods capture how solutions vary with respect to constraint weightα1\\alpha\_\{1\}, with DiffuSolve\-k exhibiting slightly larger sample variance\.At test time, we sample100100unseen parametersα∈\[0,50\]\\alpha\\in\[0,50\]and generate100100initial guesses from each method per test parameter\.

Figure[4](https://arxiv.org/html/2606.00366#S4.F4)shows the first two dimensions of 100 ground truth optima and 100 sampled initial guesses for three unseen test parametersα\(1\)=3\.83\\alpha^\{\(1\)\}=3\.83,α\(2\)=8\.21\\alpha^\{\(2\)\}=8\.21, andα\(3\)=28\.79\\alpha^\{\(3\)\}=28\.79, for problem dimensiondx=100d\_\{x\}=100\. The ground truth samples, marked by grey circles in every subplot, are obtained by running the solver from uniformly sampled initial guesses until convergence\. Samples from each method are shown as colored crosses\. The objective contours are plotted in the background, and the linear constraint on the first two dimensions,x1\+x2=0x\_\{1\}\+x\_\{2\}=0, is illustrated by the dashed blue line\. Asα\\alphaincreases fromα\(1\)\\alpha^\{\(1\)\}toα\(3\)\\alpha^\{\(3\)\}, the constraint becomes stronger, causing the optimal solutions to shift upward while also distorting the quadratic contours\.

In this unimodal example, DiffuSolve serves as a strong baseline and successfully captures how the solution varies with the constraint weightα\\alpha\. This controlled setting shows that, when the solution distribution has a simple structure, diffusion\-based global search trained on converged optima can already provide accurate predictions\. TheNeighborhood Structure Modeland GLENS, trained on the1010\-neighborhood dataset, achieve similarly accurate predictions, while DiffuSolve\-k, trained on the same dataset without additional modeling, exhibits slightly higher sample variance\. This observation provides a useful reference point for the more challenging non\-convex examples later with multimodal structure\.

To quantitatively evaluate the quality of sampled initial guesses, we warm\-start the gradient descent solver from each sample and record the number of iterations required for convergence, which corresponds to thekk\-neighborhood\. In Table[1](https://arxiv.org/html/2606.00366#S4.T1), we report results for problem dimensionsdx=100,150,200d\_\{x\}=100,150,200, including the empirical cumulative distribution fork=1,3,6k=1,3,6\. For example, for GLENS withdx=100d\_\{x\}=100, the cumulative distribution atk=1k=1is94\.05%94\.05\\%, indicating that94\.05%94\.05\\%of samples already satisfy the solver’s termination criterion\. We also report the mean and standard deviation over all10,00010\{,\}000samples, where a lower mean indicates closer to convergence\. Uniformly sampled initial guesses, which do not exploit solution structure, suffer from the curse of dimensionality: samples are sparse and tend to lie near the boundary of the high\-dimensional hypercube, requiring over4040iterations on average to converge\. DiffuSolve significantly improves convergence speed, but DiffuSolve\-k exhibits degraded performance due to the lack of explicit modeling of neighborhood structure\. Both NS\-only and GLENS benefit from the augmented data and modeling of local geometry, achieving the strongest performance and nearly identical results across different problem dimensions\.

### 4\.2Modified Himmelblau function

![Refer to caption](https://arxiv.org/html/2606.00366v1/x5.png)Figure 5:First two dimensions of the ground truth optima \(grey circles\) and sampled initial guesses \(colored crosses\) for the modified Himmelblau function under three unseen test parametersα\(1\)=\(36\.44,46\.90\)\\alpha^\{\(1\)\}=\(36\.44,46\.90\),α\(2\)=\(14\.41,35\.60\)\\alpha^\{\(2\)\}=\(14\.41,35\.60\), andα\(3\)=\(12\.90,8\.15\)\\alpha^\{\(3\)\}=\(12\.90,8\.15\), with each row corresponding to one parameter\. Each column corresponds to a different method\. GLENS produces more concentrated samples around the ground truth optima while preserving the multimodal structure\.#### 4\.2\.1Problem setup

We consider the following parametric modification of the Himmelblau function\[[68](https://arxiv.org/html/2606.00366#bib.bib68)\], extended todxd\_\{x\}dimensions:

minx∈ℝdx\\displaystyle\\min\_\{x\\in\\mathbb\{R\}^\{d\_\{x\}\}\}\\quad2dx​∑i=1dx/2\[\(x2​i−12\+x2​i−α1\)2\+\(x2​i−1\+x2​i2−α2\)2\]\\displaystyle\\frac\{2\}\{d\_\{x\}\}\\sum\_\{i=1\}^\{d\_\{x\}/2\}\\left\[\\Big\(x\_\{2i\-1\}^\{2\}\+x\_\{2i\}\-\\alpha\_\{1\}\\Big\)^\{2\}\+\\Big\(x\_\{2i\-1\}\+x\_\{2i\}^\{2\}\-\\alpha\_\{2\}\\Big\)^\{2\}\\right\]\+λ​2dx−2​∑i=1dx/2−1\(x2​i−1−x2​i\+1\)2\.\\displaystyle\+\\lambda\\frac\{2\}\{d\_\{x\}\-2\}\\sum\_\{i=1\}^\{d\_\{x\}/2\-1\}\\Big\(x\_\{2i\-1\}\-x\_\{2i\+1\}\\Big\)^\{2\}\.\(19\)Whendx=2d\_\{x\}=2,λ=0\\lambda=0,α1=11\\alpha\_\{1\}=11, andα2=7\\alpha\_\{2\}=7, this reduces to the classical Himmelblau function, a standard non\-convex test function with four local optima\. We extend the classical function by summing the Himmelblau structure over pairs of dimensions and adding a soft chained coupling term\. The problem parameterα=\(α1,α2\)∈ℝ2\\alpha=\(\\alpha\_\{1\},\\alpha\_\{2\}\)\\in\\mathbb\{R\}^\{2\}controls the locations of the local optima in the solution space\. The coupling weightλ\\lambdais fixed for the chained coupling constraint\. This objective exhibits a highly non\-convex landscape with multiple local optima grouped in clusters, as shown in Figure[3](https://arxiv.org/html/2606.00366#S4.F3)\.

#### 4\.2\.2Data collection

We consider three different problem dimensionsdx=100,150,200d\_\{x\}=100,150,200, and setλ=10\\lambda=10\. For each case, we uniformly sample9090different parameter instancesα\\alphafrom\[1,50\]2\[1,50\]^\{2\}, with 80 for training and 10 for validation\. For eachα\\alpha, we draw100100initial guessesx0x^\{0\}uniformly from\[−10,10\]dx\[\-10,10\]^\{d\_\{x\}\}\. We solve the optimization problem using the L\-BFGS\-B algorithm\[[18](https://arxiv.org/html/2606.00366#bib.bib18)\]implemented in SciPy\[[69](https://arxiv.org/html/2606.00366#bib.bib69)\]with tolerance10−310^\{\-3\}, and collect the last 10 iterates prior to convergence as the1010\-neighborhood dataset\.

#### 4\.2\.3Test results

At test time, we sample100100unseen parametersα∈\[1,50\]2\\alpha\\in\[1,50\]^\{2\}and generate100100initial guesses from each method per test parameter\.

Figure[5](https://arxiv.org/html/2606.00366#S4.F5)shows the first two dimensions of ground truth local optima and sampled initial guesses from various methods when problem dimensiondx=100d\_\{x\}=100for three unseen test parametersα\(1\)=\(36\.44,46\.90\),α\(2\)=\(14\.41,35\.60\),α\(3\)=\(12\.90,8\.15\)\\alpha^\{\(1\)\}=\(36\.44,46\.90\),\\alpha^\{\(2\)\}=\(14\.41,35\.60\),\\alpha^\{\(3\)\}=\(12\.90,8\.15\), with oneα\\alphafor each row\. The ground truth exhibits distinct local optima grouped in four clusters, and all diffusion\-based global search methods capture this multimodal structure\. The samples are visually discriminative for this problem\. DiffuSolve, trained only on11\-neighborhood data, produces samples with noticeably larger variance for parameterα\(2\)\\alpha^\{\(2\)\}, while DiffuSolve\-k exhibits even larger variance across all three test parameters\. In contrast, GLENS generates samples that are more tightly concentrated around the ground truth local optima while maintaining solution diversity\. This suggests that GLENS better captures how the locally optimal solution distribution varies with problem parameters while preserving the multimodal structure\.

We also warm\-start the L\-BFGS\-B solver from each sample and report the same empirical cumulative distribution of thekk\-neighborhood and the statistics in Table[1](https://arxiv.org/html/2606.00366#S4.T1), for problem dimensionsdx=100,150,200d\_\{x\}=100,150,200\. The uniform samples still perform poorly and suffer from the curse of dimensionality\. DiffuSolve\-k generates samples with significantly lower quality than other diffusion\-based methods, as it is trained on thekk\-neighborhood dataset without additional modeling\. TheNeighborhood Structure Modelachieves better average results than DiffuSolve, and theSolver Behavior Modelfurther pushes the samples closer to convergence, such that GLENS generates the most concentrated samples and has the best average quality overall\.

Table 1:kk\-neighborhood results when warm\-starting from samples over unseen test parameters\.†Solved by gradient descent;‡solved by L\-BFGS\-B\. Higher cumulative distribution \(↑\\uparrow\) and lowerkk\-neighborhood means \(↓\\downarrow\) indicate faster convergence\.ProblemProblemDimensionMethodCumulativeDistribution %↑\\uparrowkk\-neighborhoodStats\.↓\\downarrowk=1k=1k=3k=3k=6k=6MeanStdQP†100Uniform0\.000\.000\.0043\.240\.89DiffuSolve42\.5888\.9996\.922\.222\.44DiffuSolve\-k41\.4768\.9292\.292\.942\.76NS\-only93\.9297\.0698\.591\.322\.12GLENS94\.0597\.0698\.591\.332\.13150Uniform0\.000\.000\.0045\.190\.74DiffuSolve73\.0594\.5998\.471\.692\.53DiffuSolve\-k50\.3380\.3595\.152\.482\.85NS\-only96\.3997\.5298\.941\.302\.25GLENS96\.3997\.5498\.941\.302\.25200Uniform0\.000\.000\.0046\.570\.66DiffuSolve93\.4596\.5598\.551\.382\.45DiffuSolve\-k53\.7088\.6195\.122\.242\.86NS\-only96\.9398\.1898\.831\.302\.43GLENS96\.9098\.1598\.831\.292\.43Himmelblau‡100Uniform0\.000\.000\.0025\.256\.40DiffuSolve47\.3079\.4587\.273\.043\.90DiffuSolve\-k5\.0536\.5060\.676\.635\.19NS\-only26\.5485\.0191\.182\.783\.08GLENS40\.1093\.5597\.032\.042\.16150Uniform0\.000\.000\.0025\.616\.47DiffuSolve59\.3983\.9490\.032\.583\.53DiffuSolve\-k7\.7646\.4767\.995\.734\.84NS\-only50\.0687\.2393\.072\.423\.09GLENS63\.7992\.8596\.821\.882\.63200Uniform0\.000\.000\.0028\.336\.88DiffuSolve47\.9874\.3783\.573\.504\.48DiffuSolve\-k1\.3124\.7450\.257\.875\.55NS\-only43\.8982\.0789\.282\.973\.64GLENS57\.4891\.0695\.992\.152\.80Rosenbrock‡100Uniform0\.000\.000\.0030\.1914\.31DiffuSolve0\.0475\.2898\.193\.396\.71DiffuSolve\-k8\.5949\.8575\.396\.0212\.78NS\-only37\.3587\.6996\.623\.3111\.86GLENS40\.0090\.2596\.733\.0110\.29150Uniform0\.000\.000\.0031\.2211\.19DiffuSolve63\.1396\.7397\.611\.773\.91DiffuSolve\-k12\.4051\.7975\.864\.504\.89NS\-only69\.5294\.8295\.861\.782\.60GLENS76\.3795\.0895\.771\.662\.32200Uniform0\.000\.000\.0032\.6011\.24DiffuSolve86\.0496\.9297\.271\.502\.53DiffuSolve\-k21\.0139\.8056\.605\.995\.79NS\-only83\.4799\.4799\.621\.221\.09GLENS86\.4798\.8698\.971\.271\.51Levy‡100Uniform0\.000\.010\.0319\.353\.05DiffuSolve0\.042\.7649\.046\.882\.51DiffuSolve\-k0\.040\.3811\.5110\.083\.12NS\-only0\.106\.6441\.147\.503\.06GLENS0\.1713\.5567\.265\.792\.46150Uniform0\.000\.000\.0119\.612\.57DiffuSolve0\.043\.2543\.147\.222\.49DiffuSolve\-k0\.040\.459\.749\.582\.57NS\-only0\.114\.2551\.686\.672\.46GLENS0\.0610\.6378\.215\.321\.94200Uniform0\.000\.010\.0119\.872\.40DiffuSolve0\.0013\.5956\.576\.302\.57DiffuSolve\-k0\.064\.1536\.107\.462\.54NS\-only0\.1215\.9762\.765\.852\.40GLENS0\.0532\.8582\.214\.672\.02

### 4\.3Modified Rosenbrock function

![Refer to caption](https://arxiv.org/html/2606.00366v1/x6.png)Figure 6:First two dimensions of the ground truth optima \(grey circles\) and sampled initial guesses \(colored crosses\) for the modified Rosenbrock function under three unseen test parametersα\(1\)=\(80\.23,−0\.13\)\\alpha^\{\(1\)\}=\(80\.23,\-0\.13\),α\(2\)=\(193\.97,−0\.38\)\\alpha^\{\(2\)\}=\(193\.97,\-0\.38\), andα\(3\)=\(91\.04,−0\.44\)\\alpha^\{\(3\)\}=\(91\.04,\-0\.44\), with each row corresponding to one parameter\. Each column corresponds to a different method\. All methods capture the overall solution structure, although DiffuSolve\-k exhibits slightly larger variance in its samples\.#### 4\.3\.1Problem setup

We consider the following parametric modification of the Rosenbrock function\[[70](https://arxiv.org/html/2606.00366#bib.bib70)\], extended todxd\_\{x\}dimensions:

minx∈ℝdx\\displaystyle\\min\_\{x\\in\\mathbb\{R\}^\{d\_\{x\}\}\}\\quad∑i=1dx/2\[α1​\(\(x2​i−1−α2\)2−\(x2​i−α2\)\)2\+\(\(x2​i−1−α2\)−1\)2\],\\displaystyle\\sum\_\{i=1\}^\{d\_\{x\}/2\}\\left\[\\alpha\_\{1\}\\Big\(\(x\_\{2i\-1\}\-\\alpha\_\{2\}\)^\{2\}\-\(x\_\{2i\}\-\\alpha\_\{2\}\)\\Big\)^\{2\}\+\\Big\(\(x\_\{2i\-1\}\-\\alpha\_\{2\}\)\-1\\Big\)^\{2\}\\right\],\(20\)whereα1\\alpha\_\{1\}controls the width of the valley andα2\\alpha\_\{2\}introduces a global shift of the solutions\. Whenα1=100\\alpha\_\{1\}=100andα2=0\\alpha\_\{2\}=0, this reduces to the extended Rosenbrock function\[[71](https://arxiv.org/html/2606.00366#bib.bib71)\], a widely used global optimization test function characterized by a long, narrow valley with a curved structure\. The resulting local optima structure in the first two dimensions is shown in Figure[3](https://arxiv.org/html/2606.00366#S4.F3)\.

#### 4\.3\.2Data collection

We consider three different problem dimensionsdx=100,150,200d\_\{x\}=100,150,200, and uniformly sample9090different parameter instances withα1∈\[50,200\],α2∈\[−1\.5,0\]\\alpha\_\{1\}\\in\[50,200\],\\alpha\_\{2\}\\in\[\-1\.5,0\]\. Among them,8080are used for training and1010for validation\. For eachα\\alpha, we draw100100initial guessesx0x^\{0\}uniformly from\[−4,4\]dx\[\-4,4\]^\{d\_\{x\}\}, and solve with L\-BFGS\-B algorithm\[[18](https://arxiv.org/html/2606.00366#bib.bib18)\]in SciPy\[[69](https://arxiv.org/html/2606.00366#bib.bib69)\]with tolerance10−310^\{\-3\}\. We then collect the1010\-neighborhood dataset\.

#### 4\.3\.3Test results

At test time, we sample100100unseen parametersα1∈\[50,200\]\\alpha\_\{1\}\\in\[50,200\]andα2∈\[−1\.5,0\]\\alpha\_\{2\}\\in\[\-1\.5,0\], and generate100100initial guesses from each method per test parameter\.

In Figure[6](https://arxiv.org/html/2606.00366#S4.F6), we visualize the first two dimensions of the ground truth local optima and the sampled initial guesses with problem dimensiondx=100d\_\{x\}=100, for three test parametersα\(1\)=\(80\.23,−0\.13\)\\alpha^\{\(1\)\}=\(80\.23,\-0\.13\),α\(2\)=\(193\.97,−0\.38\)\\alpha^\{\(2\)\}=\(193\.97,\-0\.38\), andα\(3\)=\(91\.04,−0\.44\)\\alpha^\{\(3\)\}=\(91\.04,\-0\.44\)\. The ground truth samples exhibit a parabolic structure along the valley and shift with the parameterα2\\alpha\_\{2\}\. All diffusion\-based methods capture the overall solution structure reasonably well in this example, while DiffuSolve\-k, which is trained on thekk\-neighborhood dataset without additional modeling, still shows higher variance compared to other methods\.

Thekk\-neighborhood statistics in Table[1](https://arxiv.org/html/2606.00366#S4.T1)provide a more precise comparison of sample quality\. Across problem dimensionsdx=100,150,200d\_\{x\}=100,150,200, samples from NS\-only and GLENS tend to lie in smaller neighborhoods of the optima than DiffuSolve\-k and are competitive with or better than DiffuSolve\. For problem dimensiondx=200d\_\{x\}=200, theNeighborhood Structure Modelalone achieves over99%99\\%of samples converging within33solver iterations\.

### 4\.4Modified Levy function

![Refer to caption](https://arxiv.org/html/2606.00366v1/x7.png)Figure 7:First two dimensions of the ground truth optima \(grey circles\) and sampled initial guesses \(colored crosses\) for the modified Levy function under three unseen test parametersα\(1\)=0\.89\\alpha^\{\(1\)\}=0\.89,α\(2\)=0\.67\\alpha^\{\(2\)\}=0\.67, andα\(3\)=−0\.45\\alpha^\{\(3\)\}=\-0\.45, with each row corresponding to one parameter\. Each column corresponds to a different method\. The ground truth samples exhibit many local optima with highly multimodal structures, making this problem more challenging than previous examples\. DiffuSolve and DiffuSolve\-k fail to predict certain modes accurately\. In comparison, GLENS better aligns the generated samples with the ground truth locally optimal solution structure across the observed modes\.#### 4\.4\.1Problem setup

We consider the following parametric modification of the Levy function\[[72](https://arxiv.org/html/2606.00366#bib.bib72)\], extended todxd\_\{x\}dimensions\. Let

wi≔1\+xi−α1−14,i=1,…,dx\.\\displaystyle w\_\{i\}\\coloneqq 1\+\\frac\{x\_\{i\}\-\\alpha\_\{1\}\-1\}\{4\},\\quad i=1,\\ldots,d\_\{x\}\.\(21\)The objective is

minx∈ℝdx⁡sin2⁡\(π​w1\)\\displaystyle\\min\_\{x\\in\\mathbb\{R\}^\{d\_\{x\}\}\}\\;\\sin^\{2\}\\\!\\left\(\\pi w\_\{1\}\\right\)\+∑i=1dx−1\[\(wi−1\)2​\(1\+10​sin2⁡\(π​wi\+1\)\)\]\\displaystyle\+\\sum\_\{i=1\}^\{d\_\{x\}\-1\}\\left\[\\Big\(w\_\{i\}\-1\\Big\)^\{2\}\\left\(1\+10\\sin^\{2\}\\\!\\left\(\\pi w\_\{i\}\+1\\right\)\\right\)\\right\]\+\(wdx−1\)2​\[1\+sin2⁡\(2​π​wdx\)\]\.\\displaystyle\+\\left\(w\_\{d\_\{x\}\}\-1\\right\)^\{2\}\\left\[1\+\\sin^\{2\}\\\!\\left\(2\\pi w\_\{d\_\{x\}\}\\right\)\\right\]\.\(22\)Whenα1=0\\alpha\_\{1\}=0, this reduces to the classical Levy function, a multimodal global optimization test function with many local optima\. The parameterα1\\alpha\_\{1\}introduces a global shift of the solution structure\. The resulting local optima exhibit complex multimodal patterns, as shown in Figure[3](https://arxiv.org/html/2606.00366#S4.F3)\.

#### 4\.4\.2Data collection

For problem dimensionsdx=100,150d\_\{x\}=100,150, and200200, we uniformly sample9090parameter instances withα1∈\[−2\.0,2\.0\]\\alpha\_\{1\}\\in\[\-2\.0,2\.0\], among which8080are used for training and1010for validation\. For each parameter, we draw100100initial guessesx0x^\{0\}uniformly from\[−10,10\]dx\[\-10,10\]^\{d\_\{x\}\}, and solve the problem using the L\-BFGS\-B algorithm\[[18](https://arxiv.org/html/2606.00366#bib.bib18)\]implemented in SciPy\[[69](https://arxiv.org/html/2606.00366#bib.bib69)\]with tolerance10−310^\{\-3\}\. We collect the last1010iterates before convergence as the1010\-neighborhood dataset\.

#### 4\.4\.3Test results

At test time, we sample100100unseen parametersα1∈\[−2\.0,2\.0\]\\alpha\_\{1\}\\in\[\-2\.0,2\.0\], and generate100100initial guesses from each method per test parameter\.

In Figure[7](https://arxiv.org/html/2606.00366#S4.F7), we visualize the first two dimensions of100100ground truth local optima together with100100sampled initial guesses for three test parametersα\(1\)=0\.89\\alpha^\{\(1\)\}=0\.89,α\(2\)=0\.67\\alpha^\{\(2\)\}=0\.67, andα\(3\)=−0\.45\\alpha^\{\(3\)\}=\-0\.45, with problem dimensiondx=100d\_\{x\}=100\. The ground truth samples exhibit many local optima with complex structures\. All methods capture the overall solution pattern to some extent, but generally exhibit higher noise compared to previous examples\. DiffuSolve finds it challenging to predict certain modes, while DiffuSolve\-k shows even larger local variance\. GLENS produces higher\-quality samples with more accurate predictions than other methods and preserves the diversity of the solutions\.

The difficulty of this problem is also reflected in the statistics in Table[1](https://arxiv.org/html/2606.00366#S4.T1), where the gap between the Uniform method and the learning\-based methods is smaller than in the previous examples\. Consistent with the qualitative visualization, thekk\-neighborhood statistics show that GLENS produces samples closer to convergence, especially fork=3k=3andk=6k=6\. In this setting, theSolver Behavior Modelplays a more important role because learning the solution distribution is challenging for diffusion models alone, enabling GLENS to produce sharper samples that are closer to convergence, while maintaining multimodal coverage\.

## 5Hyperparameter analysis

To systematically examine the performance and robustness of the proposed method GLENS under different hyperparameter choices, we conduct a series of analyses on the numerical problems introduced earlier\. In particular, we study the effect of \(i\) the neighborhood sizekkin training data, \(ii\) the model capacity of GLENS, and \(iii\) the solver refinement guidance weightsSBs\_\{\\text\{SB\}\}from theSolver Behavior Model\.

For each experiment, we follow the evaluation settings described in the previous section and assess the quality of10,00010\{,\}000samples across100100unseen test parameters for problem dimensiondx=200d\_\{x\}=200, by warm\-starting the corresponding numerical solver\. We report the cumulative distribution of the resultingkk\-neighborhoods, fork=1,3,6k=1,3,6, over all samples, where thekk\-neighborhood corresponds to the number of solver iterations required to reach convergence from a given initial guess\. We also summarize the mean and standard deviation of the observedkk\-neighborhood values\.

### 5\.1Neighborhood sizekk

Table 2:Effect of using different neighborhood sizekkin training data for GLENS\.k=1k=1indicates the DiffuSolve results\.†Solved by gradient descent;‡solved by L\-BFGS\-B\. Higher cumulative distribution \(↑\\uparrow\) and lowerkk\-neighborhood means \(↓\\downarrow\) indicate faster convergence\.ProblemTrainingkk\-neighborhooddatasetCumulativeDistribution %↑\\uparrowkk\-neighborhoodStats\.↓\\downarrowk=1k=1k=3k=3k=6k=6MeanStdQP†193\.4596\.5598\.551\.382\.45591\.6794\.2396\.681\.532\.631096\.9098\.1598\.831\.292\.431588\.8797\.0598\.761\.422\.47Himmelblau‡147\.9874\.3783\.573\.504\.48570\.0188\.1694\.192\.133\.031057\.4891\.0695\.992\.152\.801556\.6088\.1895\.672\.182\.67Rosenbrock‡186\.0496\.9297\.271\.502\.53526\.7592\.5396\.062\.564\.491086\.4798\.8698\.971\.271\.511582\.9994\.3995\.511\.693\.98Levy‡10\.0013\.5956\.576\.302\.5750\.0222\.9868\.114\.972\.01100\.0532\.8582\.214\.672\.02150\.1215\.1965\.735\.802\.38The neighborhood sizekkcontrols a trade\-off between data coverage and quality\. Largerkkprovides more free training data and reveals richer neighborhood structure around locally optimal solutions, but may include solver iterates far from the optima, introducing noise and potentially degrading model performance\.

Table[2](https://arxiv.org/html/2606.00366#S5.T2)reports the performance of GLENS trained on different neighborhood sizeskk, wherek=1k=1corresponds to the DiffuSolve baseline\. Incorporating neighborhood data consistently helps:k=5k=5andk=10k=10give significant gains overk=1k=1\. However, the effective neighborhood around each optimum is not unlimited, and increasingkkto1515introduces iterates far from the optima that add noise to the training data and slightly degrade performance\. Across the four problems,k=10k=10provides a good balance between data augmentation and quality, suggesting that the solver’s convergence behavior is a useful guideline for how far its iterates remain informative of the local solution landscape\.

### 5\.2Model capacities

Table 3:Effect of different model dimensions for GLENS\. DiffuSolve only has an “NS” dimension, with “SB” marked as N/A\.†Solved by gradient descent;‡solved by L\-BFGS\-B\. Higher cumulative distribution \(↑\\uparrow\) and lowerkk\-neighborhood means \(↓\\downarrow\) indicate faster convergence\.ProblemModeldimTrainingNeighbor\.CumulativeDistribution %↑\\uparrowkk\-neighborhoodStats↓\\downarrowNSSBk=1k=1k=3k=3k=6k=6MeanStdQP†32N/A148\.4180\.9095\.132\.482\.79321096\.1497\.7398\.851\.302\.34641096\.1897\.8698\.881\.302\.3364N/A193\.4596\.5598\.551\.382\.45641096\.9098\.1598\.831\.292\.43128N/A169\.3479\.2683\.453\.505\.19641068\.4989\.1994\.762\.143\.271281068\.4589\.1394\.712\.143\.27Himmelblau‡32N/A154\.5071\.2481\.183\.744\.83321052\.7786\.2993\.992\.423\.31641054\.8089\.5695\.112\.213\.0864N/A147\.9874\.3783\.573\.504\.48641057\.4891\.0695\.992\.152\.80128N/A140\.7173\.7383\.853\.654\.58641043\.1488\.0594\.772\.443\.06128107\.6783\.3496\.402\.952\.34Rosenbrock‡32N/A140\.3396\.2597\.281\.992\.35321054\.9395\.9498\.031\.883\.84641053\.7395\.6397\.871\.913\.8764N/A186\.0496\.9297\.271\.502\.53641086\.4798\.8698\.971\.271\.51128N/A127\.3891\.1796\.982\.444\.32641041\.7991\.7194\.862\.414\.541281040\.2391\.2394\.902\.433\.95Levy‡32N/A10\.080\.7210\.1510\.593\.0232100\.042\.0428\.228\.973\.3864100\.013\.0140\.877\.552\.7864N/A10\.0013\.5956\.576\.302\.5764100\.0532\.8582\.214\.672\.02128N/A10\.037\.5343\.917\.102\.6764100\.0218\.2475\.185\.362\.24128100\.016\.2336\.487\.802\.98Both theNeighborhood Structure Modeland theSolver Behavior Modeluse a U\-Net backbone, whose base feature dimension controls the model capacity\. Table[3](https://arxiv.org/html/2606.00366#S5.T3)evaluates GLENS under U\-Net dimensions3232,6464, and128128, alongside DiffuSolve at the same dimensions for comparison\.

Increasing the dimension from3232to6464consistently improves performance, indicating that a minimum capacity is needed to capture the multimodal solution structure\. Further increasing the dimension to128128tends to degrade performance\. TheSolver Behavior Modelappears especially prone to overfitting: results often improve when its dimension is reduced from128128to6464, suggesting that modeling a largerkk\-neighborhood distribution requires a more balanced capacity\. A base dimension of6464provides a good trade\-off between expressiveness and stability across all four problems\. Across model dimensions, GLENS consistently outperforms the baselines\.

### 5\.3Guidance weight forSolver Behavior Model

Table 4:Effect of different guidance weight for theSolver Behavior Modelin GLENS\.†Solved by gradient descent;‡solved by L\-BFGS\-B\. Higher cumulative distribution \(↑\\uparrow\) and lowerkk\-neighborhood means \(↓\\downarrow\) indicate faster convergence\.ProblemGuidanceweightsSBs\_\{\\text\{SB\}\}CumulativeDistribution %↑\\uparrowkk\-neighborhoodStats↓\\downarrowk=1k=1k=3k=3k=6k=6MeanStdQP†1096\.9298\.1998\.831\.292\.435096\.9298\.1898\.831\.292\.4310096\.9098\.1598\.831\.292\.4320096\.9198\.1298\.831\.292\.43Himmelblau‡1034\.1480\.3891\.882\.923\.215048\.1589\.4995\.882\.262\.7210053\.9890\.5395\.782\.172\.852000\.000\.0415\.6111\.325\.09Rosenbrock‡1083\.8599\.3599\.491\.231\.075085\.2199\.1699\.281\.241\.2610086\.4798\.8698\.971\.271\.5120087\.5898\.1498\.401\.353\.33Levy‡100\.1428\.4378\.584\.892\.14500\.1834\.3683\.734\.541\.971000\.0532\.8582\.214\.672\.022000\.0418\.6772\.225\.452\.12As described in Algorithm[1](https://arxiv.org/html/2606.00366#alg1), solver refinement from theSolver Behavior Modelis incorporated into diffusion sampling via a classifier guidance step with guidance weightsSBs\_\{\\text\{SB\}\}\. In Table[4](https://arxiv.org/html/2606.00366#S5.T4), we study the effect of varyingsSBs\_\{\\text\{SB\}\}while keeping theNeighborhood Structure Modelfixed\.

For the soft\-constrained QP and the modified Rosenbrock example, performance is quite consistent regardless of the choice of guidance weight, suggesting that theNeighborhood Structure Modelalready provides high\-quality initial guesses in this setting and leaves limited room for further refinement\. For the modified Himmelblau and Levy examples, where the solution structure is more challenging to model, increasingsSBs\_\{\\text\{SB\}\}significantly improves performance up to a moderate level, indicating that theSolver Behavior Modelplays a crucial role in refining samples in more complex multimodal landscapes\. However, overly strong guidance withsSB=200s\_\{\\text\{SB\}\}=200leads to performance degradation, likely due to overshooting during the refinement process\.

Overall, theSolver Behavior Modelmakes a complementary contribution alongside theNeighborhood Structure Model, particularly on harder problems, and a moderate guidance weightsSB=100s\_\{\\text\{SB\}\}=100strikes a good balance between refinement strength and stability\.

## 6Real\-world example: two\-robot navigation

### 6\.1Problem setup

We consider a real\-world two\-robot navigation problem in which two robots aim to reach their respective goal regions in minimum time while avoiding multiple obstacles and colliding with each other, inspired by\[[13](https://arxiv.org/html/2606.00366#bib.bib13)\]\. This poses the following open\-loop trajectory optimization problem:

mintf,u​\(⋅\)\\displaystyle\\min\_\{t\_\{f\},u\(\\cdot\)\}\\quadtf\\displaystyle t\_\{f\}s\.t\.ζ˙j​\(t\)=f​\(ζj​\(t\),uj​\(t\)\),\\displaystyle\\dot\{\\zeta\}\_\{j\}\(t\)=f\(\\zeta\_\{j\}\(t\),u\_\{j\}\(t\)\),∀t∈\[0,tf\],∀j∈\{1,2\},\\displaystyle\\forall t\\in\[0,t\_\{f\}\],\\ \\forall j\\in\\\{1,2\\\},ζj​\(0\)=ζinitj,\\displaystyle\\zeta\_\{j\}\(0\)=\\zeta\_\{\\text\{init\}\_\{j\}\},∀j∈\{1,2\},\\displaystyle\\forall j\\in\\\{1,2\\\},‖pj​\(tf\)−pgoalj‖2≤εgoal,\\displaystyle\\\|p\_\{j\}\(t\_\{f\}\)\-p\_\{\\text\{goal\}\_\{j\}\}\\\|\_\{2\}\\leq\\varepsilon\_\{\\text\{goal\}\},∀j∈\{1,2\},\\displaystyle\\forall j\\in\\\{1,2\\\},‖pj​\(t\)−pobsi‖2≥robsi\+rrobot,\\displaystyle\\\|p\_\{j\}\(t\)\-p\_\{\\text\{obs\}\_\{i\}\}\\\|\_\{2\}\\geq r\_\{\\text\{obs\}\_\{i\}\}\+r\_\{\\text\{robot\}\},∀t∈\[0,tf\],i=1,…,Nobs,∀j∈\{1,2\},\\displaystyle\\forall t\\in\[0,t\_\{f\}\],\\ i=1,\\ldots,N\_\{\\text\{obs\}\},\\ \\forall j\\in\\\{1,2\\\},‖p1​\(t\)−p2​\(t\)‖2≥2​rrobot,\\displaystyle\\\|p\_\{1\}\(t\)\-p\_\{2\}\(t\)\\\|\_\{2\}\\geq 2r\_\{\\text\{robot\}\},∀t∈\[0,tf\],\\displaystyle\\forall t\\in\[0,t\_\{f\}\],umin≤uj​\(t\)≤umax,\\displaystyle u\_\{\\min\}\\leq u\_\{j\}\(t\)\\leq u\_\{\\max\},∀t∈\[0,tf\],∀j∈\{1,2\},\\displaystyle\\forall t\\in\[0,t\_\{f\}\],\\ \\forall j\\in\\\{1,2\\\},tfmin≤tf≤tfmax\.\\displaystyle t\_\{f\_\{\\min\}\}\\leq t\_\{f\}\\leq t\_\{f\_\{\\max\}\}\.\(23\)
Heretft\_\{f\}is the terminal time, with boundstfmint\_\{f\_\{\\min\}\}andtfmaxt\_\{f\_\{\\max\}\}\. For each robotj∈\{1,2\}j\\in\\\{1,2\\\},ζj\\zeta\_\{j\}is the state,pjp\_\{j\}is its position component, anduju\_\{j\}is the control input bounded byuminu\_\{\\min\}andumaxu\_\{\\max\}\. Each robot starts from a given initial stateζinitj\\zeta\_\{\\text\{init\}\_\{j\}\}and must reach its goal positionpgoaljp\_\{\\text\{goal\}\_\{j\}\}within toleranceεgoal\\varepsilon\_\{\\text\{goal\}\}\. The environment containsNobsN\_\{\\text\{obs\}\}circular obstacles with centerspobsip\_\{\\text\{obs\}\_\{i\}\}and radiirobsir\_\{\\text\{obs\}\_\{i\}\}; the robots have radiusrrobotr\_\{\\text\{robot\}\}\. Obstacle\-avoidance and inter\-robot separation constraints enforce a minimum distance ofrobsi\+rrobotr\_\{\\text\{obs\}\_\{i\}\}\+r\_\{\\text\{robot\}\}to each obstacle and2​rrobot2r\_\{\\text\{robot\}\}between the two robots\.

We assume each robot follows the same nonlinear dynamicsff,

p˙x=v​cos⁡θ,p˙y=v​sin⁡θ,v˙=a,θ˙=ω,\\displaystyle\\dot\{p\}\_\{x\}=v\\cos\\theta,\\quad\\dot\{p\}\_\{y\}=v\\sin\\theta,\\quad\\dot\{v\}=a,\\quad\\dot\{\\theta\}=\\omega,\(24\)with stateζj=\(pxj,pyj,vj,θj\)\\zeta\_\{j\}=\(p\_\{x\_\{j\}\},p\_\{y\_\{j\}\},v\_\{j\},\\theta\_\{j\}\), wherepj=\(pxj,pyj\)p\_\{j\}=\(p\_\{x\_\{j\}\},p\_\{y\_\{j\}\}\)is the position,vjv\_\{j\}is the linear speed, andθj\\theta\_\{j\}is the orientation\. The control input isuj​\(t\)=\(aj,ωj\)u\_\{j\}\(t\)=\(a\_\{j\},\\omega\_\{j\}\), consisting of linear acceleration and angular velocity\.

To solve problem \([6\.1](https://arxiv.org/html/2606.00366#S6.Ex8)\) under the dynamics \([24](https://arxiv.org/html/2606.00366#S6.E24)\), we use a forward shooting method for control transcription\. The time horizon\[0,tf\]\[0,t\_\{f\}\]is discretized intoTTuniform intervals, and the dynamics are integrated using a fourth\-order Runge\-Kutta \(RK4\) method\. The resulting decision variables consist of the terminal timetft\_\{f\}and the control sequence\{uj,k\}k=0T−1\\\{u\_\{j,k\}\\\}\_\{k=0\}^\{T\-1\}forj=1,2j=1,2, with goal\-reaching, obstacle\-avoidance, and robot\-separation constraints enforced on the discretized trajectory\.

For this problem family, the problem parameterα\\alphacollects the obstacle positionspobsip\_\{\\text\{obs\}\_\{i\}\}and radiirobsir\_\{\\text\{obs\}\_\{i\}\}, so that varyingα\\alphayields different environments in which to search for diverse trajectories\.

### 6\.2Data collection

We fix the initial states asζinit1=\(0,10,0,0\),ζinit2=\(10,10,0,0\),\\zeta\_\{\\text\{init\}\_\{1\}\}=\(0,10,0,0\),\\ \\zeta\_\{\\text\{init\}\_\{2\}\}=\(10,10,0,0\),and set the goal positions topgoal1=\(10,0\),pgoal2=\(0,0\)\.p\_\{\\text\{goal\}\_\{1\}\}=\(10,0\),\\ p\_\{\\text\{goal\}\_\{2\}\}=\(0,0\)\.The goal tolerance isεgoal=0\.2\\varepsilon\_\{\\text\{goal\}\}=0\.2, the robot radius isrrobot=0\.2r\_\{\\text\{robot\}\}=0\.2, and the number of obstacles isNobs=2N\_\{\\text\{obs\}\}=2\. The control bounds are set asamin=−1,amax=1,ωmin=−1,ωmax=1,a\_\{\\min\}=\-1,\\ a\_\{\\max\}=1,\\ \\omega\_\{\\min\}=\-1,\\ \\omega\_\{\\max\}=1,and the terminal time bound istfmin=0t\_\{f\_\{\\min\}\}=0andtfmax=20t\_\{f\_\{\\max\}\}=20\.

To collect training data, we uniformly sample9090obstacle configurationsα\\alpha, where obstacle centerspobsip\_\{\\text\{obs\}\_\{i\}\}are drawn from\[2\.0,8\.0\]2\[2\.0,8\.0\]^\{2\}and radiirobsir\_\{\\text\{obs\}\_\{i\}\}from\[0\.5,1\.5\]\[0\.5,1\.5\], with 80 for training and 10 for validation\. For each parameter instance, we generate100100uniformly sampled initial guesses by sampling the control sequences withak∈\[−1,1\]a\_\{k\}\\in\[\-1,1\],ωk∈\[−1,1\]\\omega\_\{k\}\\in\[\-1,1\], and terminal timetf∈\[0,20\]t\_\{f\}\\in\[0,20\]\. The time horizon is discretized withT=20T=20\.

Each instance is solved using SNOPT\[[19](https://arxiv.org/html/2606.00366#bib.bib19)\]via pyOptSparse\[[73](https://arxiv.org/html/2606.00366#bib.bib73)\], with a maximum major iteration limit of10001000and feasibility and optimality tolerances set to10−210^\{\-2\}\. We only keep those solutions that are verified as local optima\. In addition, we observe that the final SNOPT iterates near convergence are often nearly identical, so we apply a simple filtering rule that an iterate is only kept if its distance from the last selected iterate exceeds a predefined threshold\. Figure[8](https://arxiv.org/html/2606.00366#S6.F8)visualizes the final1010iterates obtained after filtering under different thresholds\. We select a threshold of0\.20\.2and collect1010filtered iterates as the 10\-neighborhood data, which provides sufficient variation while remaining close to the optima\. The resulting dataset is split into training, validation, and test sets, containing8080,1010, and2020parameter instances, respectively\.

### 6\.3Test results

![Refer to caption](https://arxiv.org/html/2606.00366v1/x8.png)Figure 8:Effect of the filtering threshold on the resultingkk\-neighborhood iterates in the two\-robot navigation example\. A low threshold introduces many near\-duplicate iterates in the training data, while a large threshold may include iterates that are far from the optimal solutions\.![Refer to caption](https://arxiv.org/html/2606.00366v1/x9.png)Figure 9:Cumulative distribution of thekk\-neighborhoods reached by sampled initial guesses in the two\-robot navigation example\. Steeper curves correspond to samples that lie closer to the converged optima and converge in fewer solver iterations\. Samples generated by GLENS are concentrated in smallerkk\-neighborhoods than those from the baselines\.![Refer to caption](https://arxiv.org/html/2606.00366v1/x10.png)Figure 10:Diverse locally optimal trajectories generated by GLENS for two unseen test configurations in the two\-robot navigation example\.Table 5:kk\-neighborhood and solving time statistics over locally optimal runs, and the locally optimal ratios over all sampled initial guesses in the two\-robot navigation example\. All samples are used as initial guesses to warm\-start the SNOPT solver\. Higher cumulative distribution values \(↑\\uparrow\) and lowerkk\-neighborhood statistics \(↓\\downarrow\) correspond to closer to convergence\.Methodkk\-neighborhood Stats\. \(↓\\downarrow\)Solving Time \(↓\\downarrow\)Optimal Ratio % \(↑\\uparrow\)MeanStdMedianMeanStdMedianUniform342\.17235\.3928148\.0733\.7739\.1271\.5DiffuSolve160\.65177\.959422\.7226\.1512\.5484\.2DiffuSolve\-k170\.26181\.349524\.4226\.9913\.2884\.7NS\-only136\.89153\.897619\.5522\.6810\.6684\.0GLENS134\.76161\.397119\.0623\.429\.7685\.5At test time, we sample2020unseen parameters and generate5050samples from each method per parameter instance to warm\-start SNOPT\. Figure[9](https://arxiv.org/html/2606.00366#S6.F9)plots the cumulative distribution of thekk\-neighborhood reached by these initial guesses; steeper curves indicate that the corresponding method generates samples closer to the converged optima\. Samples generated by GLENS converge significantly faster than those from all baselines\. This is consistent with Table[5](https://arxiv.org/html/2606.00366#S6.T5), which summarizes thekk\-neighborhood and solving time statistics over all locally optimal runs\. Compared with the previous examples, thekk\-neighborhood statistics exhibit higher variance, since some instances require up to10001000solver iterations to converge; the median is therefore substantially lower than the mean across all methods\. Despite this variability, GLENS achieves the lowest mean and mediankk\-neighborhood values, and the highest proportion of samples that reach locally optimal solutions\.

We further analyze the computational cost of each method\. For the learning\-based methods, diffusion\-model inference runs on an NVIDIA L40 GPU and generates10001000samples in under one second, so the initial\-guess generation time is negligible compared with the overall solving time\. Each initial guess is then used to warm\-start SNOPT on a 2\.0 GHz Intel Sapphire Rapids CPU with four cores\. The resulting solving\-time statistics are reported in Table[5](https://arxiv.org/html/2606.00366#S6.T5): initial guesses produced by GLENS require the least time to converge to local optima, consistent with their smallerkk\-neighborhoods\.

In addition to producing high\-quality initial guesses, GLENS preserves the multimodality of the solution landscape\. Figure[10](https://arxiv.org/html/2606.00366#S6.F10)illustrates diverse locally optimal trajectories obtained for two unseen test parameter instances, after warm\-starting SNOPT with samples generated by GLENS\. Under different obstacle configurations, GLENS discovers multiple distinct routes for the robots to reach their respective goals\.

## 7Conclusions

We proposed GLENS, a data\-driven and data\-efficient global search method for generating high\-quality initial guesses in non\-convex continuous optimization\. Our approach addresses the data scarcity challenge in existing learning\-based methods by using intermediate solver iterates, i\.e\., thekk\-neighborhood dataset, as free data augmentation\. GLENS combines two complementary models to exploit thekk\-neighborhood dataset: aNeighborhood Structure Modelthat uses diffusion to learn the neighborhood structure around optima, and aSolver Behavior Modelthat learns the solver’s refinement behavior; the two are trained separately and are combined at sampling time to guide diffusion samples toward optimal solutions\.

Experimental results show that GLENS effectively exploits the structure contained in solver iterates to generate high\-quality initial guesses while preserving solution diversity and multimodality\. Across benchmark problems and a two\-robot navigation problem, GLENS generates diverse samples and improves solver convergence under different optimization solvers, including gradient descent, L\-BFGS\-B, and SNOPT\. The hyperparameter analysis further provides practical guidance for applying GLENS, showing how neighborhood size, model capacity, and guidance strength affect performance\.

Future work includes exploring richer conditioning strategies, such as incorporating objective values or constraint violation as additional conditional inputs\. Another exciting direction is to leverage online solver data for adaptation or fine\-tuning as new problem instances are encountered\.

The experiments in this work were conducted using computational resources supported by Princeton Research Computing, a consortium including the Princeton Institute for Computational Science and Engineering \(PICSciE\) and the Office of Information Technology’s High Performance Computing Center and Visualization Laboratory at Princeton University\.

## Declarations

- •Funding\.This work was partially supported by the Princeton Laboratory for Artificial Intelligence’s \[PLI/AI2/NAM\] initiative\.
- •Competing interests\.The authors have no competing interests to declare that are relevant to the content of this article\.
- •
- •
- •Author contributions\.Anjian Li contributed to conceptualization, data curation, investigation, methodology, software, validation, visualization, writing of the original draft, and writing, review, and editing\. Bartolomeo Stellato contributed to funding acquisition, supervision, and writing, review, and editing of the manuscript\. Ryne Beeson contributed to conceptualization, funding acquisition, methodology, project administration, resources, supervision, and writing, review, and editing of the manuscript\.

## References

- \\bibcommenthead
- Dauner et al\. \[2024\]Dauner, D\., Hallgarten, M\., Li, T\., Weng, X\., Huang, Z\., Yang, Z\., Li, H\., Gilitschenski, I\., Ivanovic, B\., Pavone, M\.,et al\.: NAVSIM: Data\-driven non\-reactive autonomous vehicle simulation and benchmarking\. Advances in Neural Information Processing Systems37, 28706–28719 \(2024\)[https://doi\.org/10\.52202/079017\-0902](https://doi.org/10.52202/079017-0902)
- Englander and Conway \[2017\]Englander, J\.A\., Conway, B\.A\.: Automated solution of the low\-thrust interplanetary trajectory problem\. Journal of Guidance, Control, and Dynamics40\(1\), 15–27 \(2017\)[https://doi\.org/10\.2514/1\.G002124](https://doi.org/10.2514/1.G002124)
- Locatelli and Schoen \[2016\]Locatelli, M\., Schoen, F\.: Global optimization based on local searches\. Annals of Operations Research240\(1\), 251–270 \(2016\)[https://doi\.org/10\.1007/s10479\-015\-2014\-2](https://doi.org/10.1007/s10479-015-2014-2)
- Leary \[2000\]Leary, R\.H\.: Global optimization on funneling landscapes\. Journal of Global Optimization18\(4\), 367–383 \(2000\)[https://doi\.org/10\.1023/A:1026500301312](https://doi.org/10.1023/A:1026500301312)
- Amos et al\. \[2023\]Amos, B\.,et al\.: Tutorial on amortized optimization\. Foundations and Trends® in Machine Learning16\(5\), 592–732 \(2023\)[https://doi\.org/10\.1561/2200000102](https://doi.org/10.1561/2200000102)
- Chen et al\. \[2022\]Chen, S\.W\., Wang, T\., Atanasov, N\., Kumar, V\., Morari, M\.: Large scale model predictive control with neural networks and primal active sets\. Automatica135, 109947 \(2022\)[https://doi\.org/10\.1016/j\.automatica\.2021\.109947](https://doi.org/10.1016/j.automatica.2021.109947)
- Cauligi et al\. \[2021\]Cauligi, A\., Culbertson, P\., Schmerling, E\., Schwager, M\., Stellato, B\., Pavone, M\.: CoCo: Online mixed\-integer control via supervised learning\. IEEE Robotics and Automation Letters7\(2\), 1447–1454 \(2021\)[https://doi\.org/10\.1109/LRA\.2021\.3135931](https://doi.org/10.1109/LRA.2021.3135931)
- Bertsimas and Stellato \[2021\]Bertsimas, D\., Stellato, B\.: The voice of optimization\. Machine Learning110\(2\), 249–277 \(2021\)[https://doi\.org/10\.1007/s10994\-020\-05893\-5](https://doi.org/10.1007/s10994-020-05893-5)
- Sambharya et al\. \[2024\]Sambharya, R\., Hall, G\., Amos, B\., Stellato, B\.: Learning to warm\-start fixed\-point optimization algorithms\. Journal of Machine Learning Research25\(166\), 1–46 \(2024\)
- Song and Ermon \[2019\]Song, Y\., Ermon, S\.: Generative modeling by estimating gradients of the data distribution\. Advances in neural information processing systems32\(2019\)
- Ho et al\. \[2020\]Ho, J\., Jain, A\., Abbeel, P\.: Denoising diffusion probabilistic models\. Advances in neural information processing systems33, 6840–6851 \(2020\)
- Sun and Yang \[2023\]Sun, Z\., Yang, Y\.: DIFUSCO: Graph\-based diffusion solvers for combinatorial optimization\. Advances in neural information processing systems36, 3706–3731 \(2023\)
- Li et al\. \[2025\]Li, A\., Ding, Z\., Dieng, A\.B\., Beeson, R\.: DiffuSolve: Diffusion\-based solver for non\-convex trajectory optimization\. In: Proceedings of the 7th Annual Learning for Dynamics & Control Conference\. Proceedings of Machine Learning Research, vol\. 283, pp\. 45–58 \(2025\)\. PMLR
- Graebner and Beeson \[2025\]Graebner, J\., Beeson, R\.: Global search for optimal low thrust spacecraft trajectories using diffusion models and the indirect method\. Journal of the Astronautical Sciences72\(6\), 62 \(2025\)[https://doi\.org/10\.1007/s40295\-025\-00535\-1](https://doi.org/10.1007/s40295-025-00535-1)
- Beeson et al\. \[2024\]Beeson, R\., Li, A\., Sinha, A\.: Global search of optimal spacecraft trajectories using amortization and deep generative models\. arXiv preprint arXiv:2412\.20023 \(2024\)[https://doi\.org/10\.48550/arXiv\.2412\.20023](https://doi.org/10.48550/arXiv.2412.20023)
- Li et al\. \[2026\]Li, A\., Ding, Z\., Dieng, A\.B\., Beeson, R\.: Constraint\-aware diffusion models for trajectory optimization\. In: Blasch, E\., Darema, F\., Metaxas, D\. \(eds\.\) Dynamic Data Driven Applications Systems, pp\. 308–316\. Springer, Cham \(2026\)\.[https://doi\.org/10\.1007/978\-3\-031\-94895\-4\_32](https://doi.org/10.1007/978-3-031-94895-4_32)
- Li and Beeson \[2025\]Li, A\., Beeson, R\.: Aligning diffusion model with problem constraints for trajectory optimization\. In: Blasch, E\., Darema, F\., Metaxas, D\. \(eds\.\) Handbook of Dynamic Data\-Driven Applications Systems vol\. 4\. Springer, Cham \(2025\)\.[https://doi\.org/10\.48550/arXiv\.2504\.00342](https://doi.org/10.48550/arXiv.2504.00342)
- Byrd et al\. \[1995\]Byrd, R\.H\., Lu, P\., Nocedal, J\., Zhu, C\.: A limited memory algorithm for bound constrained optimization\. SIAM Journal on scientific computing16\(5\), 1190–1208 \(1995\)[https://doi\.org/10\.1137/0916069](https://doi.org/10.1137/0916069)
- Gill et al\. \[2005\]Gill, P\.E\., Murray, W\., Saunders, M\.A\.: SNOPT: An SQP algorithm for large\-scale constrained optimization\. SIAM review47\(1\), 99–131 \(2005\)[https://doi\.org/10\.1137/S0036144504446096](https://doi.org/10.1137/S0036144504446096)
- Holland \[1992\]Holland, J\.H\.: Genetic algorithms\. Scientific american267\(1\), 66–72 \(1992\)[https://doi\.org/10\.1038/scientificamerican0792\-66](https://doi.org/10.1038/scientificamerican0792-66)
- Storn and Price \[1997\]Storn, R\., Price, K\.: Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces\. Journal of global optimization11\(4\), 341–359 \(1997\)[https://doi\.org/10\.1023/A:1008202821328](https://doi.org/10.1023/A:1008202821328)
- Kennedy and Eberhart \[1995\]Kennedy, J\., Eberhart, R\.: Particle swarm optimization\. In: Proceedings of ICNN’95\-international Conference on Neural Networks, vol\. 4, pp\. 1942–1948 \(1995\)\.[https://doi\.org/10\.1109/ICNN\.1995\.488968](https://doi.org/10.1109/ICNN.1995.488968)\. IEEE
- Cage et al\. \[1994\]Cage, P\., Kroo, I\., Braun, R\.: Interplanetary trajectory optimization using a genetic algorithm\. In: Astrodynamics Conference, p\. 3773 \(1994\)\.[https://doi\.org/10\.2514/6\.1994\-3773](https://doi.org/10.2514/6.1994-3773)
- Kim and Spencer \[2002\]Kim, Y\.H\., Spencer, D\.B\.: Optimal spacecraft rendezvous using genetic algorithms\. Journal of Spacecraft and Rockets39\(6\), 859–865 \(2002\)[https://doi\.org/10\.2514/2\.3908](https://doi.org/10.2514/2.3908)
- Olds et al\. \[2007\]Olds, A\.D\., Kluever, C\.A\., Cupples, M\.L\.: Interplanetary mission design using differential evolution\. Journal of Spacecraft and Rockets44\(5\), 1060–1070 \(2007\)[https://doi\.org/10\.2514/1\.27242](https://doi.org/10.2514/1.27242)
- Izzo et al\. \[2007\]Izzo, D\., Becerra, V\.M\., Myatt, D\.R\., Nasuto, S\.J\., Bishop, J\.M\.: Search space pruning and global optimisation of multiple gravity assist spacecraft trajectories\. Journal of Global Optimization38\(2\), 283–296 \(2007\)[https://doi\.org/10\.1007/s10898\-006\-9106\-0](https://doi.org/10.1007/s10898-006-9106-0)
- Miller et al\. \[1997\]Miller, J\.F\., Thomson, P\., Fogarty, T\., et al\.: Designing electronic circuits using evolutionary algorithms\. arithmetic circuits: A case study\. Genetic algorithms and evolution strategies in engineering and computer science10\(1997\)
- Fakhfakh et al\. \[2010\]Fakhfakh, M\., Cooren, Y\., Sallem, A\., Loulou, M\., Siarry, P\.: Analog circuit design optimization through the particle swarm optimization technique\. Analog integrated circuits and signal processing63\(1\), 71–82 \(2010\)[https://doi\.org/10\.1007/s10470\-009\-9361\-3](https://doi.org/10.1007/s10470-009-9361-3)
- Kobayashi et al\. \[1995\]Kobayashi, S\., Ono, I\., Yamamura, M\.: An efficient genetic algorithm for job shop scheduling problems\. In: Proceedings of the 6th International Conference on Genetic Algorithms, pp\. 506–511 \(1995\)
- Pezzella et al\. \[2008\]Pezzella, F\., Morganti, G\., Ciaschetti, G\.: A genetic algorithm for the flexible job\-shop scheduling problem\. Computers & operations research35\(10\), 3202–3212 \(2008\)[https://doi\.org/10\.1016/j\.cor\.2007\.02\.014](https://doi.org/10.1016/j.cor.2007.02.014)
- Ugray et al\. \[2007\]Ugray, Z\., Lasdon, L\., Plummer, J\., Glover, F\., Kelly, J\., Martí, R\.: Scatter search and local nlp solvers: A multistart framework for global optimization\. INFORMS Journal on computing19\(3\), 328–340 \(2007\)[https://doi\.org/10\.1287/ijoc\.1060\.0175](https://doi.org/10.1287/ijoc.1060.0175)
- Wales and Doye \[1997\]Wales, D\.J\., Doye, J\.P\.: Global optimization by basin\-hopping and the lowest energy structures of lennard\-jones clusters containing up to 110 atoms\. The Journal of Physical Chemistry A101\(28\), 5111–5116 \(1997\)[https://doi\.org/10\.1021/jp970984n](https://doi.org/10.1021/jp970984n)
- Rondina and Da Silva \[2013\]Rondina, G\.G\., Da Silva, J\.L\.: Revised basin\-hopping monte carlo algorithm for structure optimization of clusters and nanoparticles\. Journal of chemical information and modeling53\(9\), 2282–2298 \(2013\)[https://doi\.org/10\.1021/ci400224z](https://doi.org/10.1021/ci400224z)
- Banerjee et al\. \[2021\]Banerjee, A\., Jasrasaria, D\., Niblett, S\.P\., Wales, D\.J\.: Crystal structure prediction for benzene using basin\-hopping global optimization\. The Journal of Physical Chemistry A125\(17\), 3776–3784 \(2021\)[https://doi\.org/10\.1021/acs\.jpca\.1c00903](https://doi.org/10.1021/acs.jpca.1c00903)
- Kucharik et al\. \[2014\]Kucharik, M\., Hofacker, I\.L\., Stadler, P\.F\., Qin, J\.: Basin hopping graph: a computational framework to characterize rna folding landscapes\. Bioinformatics30\(14\), 2009–2017 \(2014\)[https://doi\.org/10\.1093/bioinformatics/btu156](https://doi.org/10.1093/bioinformatics/btu156)
- Prentiss et al\. \[2008\]Prentiss, M\.C\., Wales, D\.J\., Wolynes, P\.G\.: Protein structure prediction using basin\-hopping\. The Journal of chemical physics128\(22\) \(2008\)[https://doi\.org/10\.1063/1\.2929833](https://doi.org/10.1063/1.2929833)
- McCarty et al\. \[2018\]McCarty, S\.L\., Burke, L\.M\., McGuire, M\.: Parallel monotonic basin hopping for low thrust trajectory optimization\. In: 2018 Space Flight Mechanics Meeting, p\. 1452 \(2018\)\.[https://doi\.org/10\.2514/6\.2018\-1452](https://doi.org/10.2514/6.2018-1452)
- Vasile et al\. \[2010\]Vasile, M\., Minisci, E\., Locatelli, M\.: Analysis of some global optimization algorithms for space trajectory design\. Journal of Spacecraft and Rockets47\(2\), 334–344 \(2010\)[https://doi\.org/10\.2514/1\.45742](https://doi.org/10.2514/1.45742)
- Zhang et al\. \[2019\]Zhang, X\., Bujarbaruah, M\., Borrelli, F\.: Safe and near\-optimal policy learning for model predictive control using primal\-dual neural networks\. In: 2019 American Control Conference \(ACC\), pp\. 354–359 \(2019\)\.[https://doi\.org/10\.23919/ACC\.2019\.8814335](https://doi.org/10.23919/ACC.2019.8814335)\. IEEE
- Bertsimas and Margaritis \[2025\]Bertsimas, D\., Margaritis, G\.: Global optimization: a machine learning approach\. Journal of Global Optimization91\(1\), 1–37 \(2025\)[https://doi\.org/10\.1007/s10898\-024\-01434\-9](https://doi.org/10.1007/s10898-024-01434-9)
- Li et al\. \[2023\]Li, Y\., Guo, J\., Wang, R\., Yan, J\.: T2t: From distribution learning in training to gradient search in testing for combinatorial optimization\. Advances in Neural Information Processing Systems36, 50020–50040 \(2023\)
- Cassioli et al\. \[2012\]Cassioli, A\., Di Lorenzo, D\., Locatelli, M\., Schoen, F\., Sciandrone, M\.: Machine learning for global optimization\. Computational Optimization and Applications51\(1\), 279–303 \(2012\)[https://doi\.org/10\.1007/s10589\-010\-9330\-x](https://doi.org/10.1007/s10589-010-9330-x)
- Li et al\. \[2023\]Li, A\., Sinha, A\., Beeson, R\.: Amortized global search for efficient preliminary trajectory design with deep generative models\. In: AAS/AIAA Astrodynamics Specialist Conference \(2023\)\.[https://doi\.org/10\.48550/arXiv\.2308\.03960](https://doi.org/10.48550/arXiv.2308.03960)
- Sharony et al\. \[2024\]Sharony, E\., Yang, H\., Che, T\., Pavone, M\., Mannor, S\., Karkus, P\.: Learning multiple initial solutions to optimization problems\. arXiv preprint arXiv:2411\.02158 \(2024\)[https://doi\.org/10\.48550/arXiv\.2411\.02158](https://doi.org/10.48550/arXiv.2411.02158)
- Graebner et al\. \[2024\]Graebner, J\., Li, A\., Sinha, A\., Beeson, R\.: Learning optimal control and dynamical structure of global trajectory search problems with diffusion models\. In: AAS/AIAA Astrodynamics Specialist Conference \(2024\)\.[https://doi\.org/10\.48550/arXiv\.2410\.02976](https://doi.org/10.48550/arXiv.2410.02976)
- Li and Todorov \[2004\]Li, W\., Todorov, E\.: Iterative linear quadratic regulator design for nonlinear biological movement systems\. In: First International Conference on Informatics in Control, Automation and Robotics, vol\. 2, pp\. 222–229 \(2004\)\. SciTePress
- Giannone et al\. \[2023\]Giannone, G\., Srivastava, A\., Winther, O\., Ahmed, F\.: Aligning optimization trajectories with diffusion models for constrained design generation\. Advances in neural information processing systems36, 51830–51861 \(2023\)
- Sohl\-Dickstein et al\. \[2015\]Sohl\-Dickstein, J\., Weiss, E\., Maheswaranathan, N\., Ganguli, S\.: Deep unsupervised learning using nonequilibrium thermodynamics\. In: International Conference on Machine Learning, pp\. 2256–2265 \(2015\)\. pmlr
- Nichol et al\. \[2021\]Nichol, A\., Dhariwal, P\., Ramesh, A\., Shyam, P\., Mishkin, P\., McGrew, B\., Sutskever, I\., Chen, M\.: GLIDE: Towards photorealistic image generation and editing with text\-guided diffusion models\. arXiv preprint arXiv:2112\.10741 \(2021\)[https://doi\.org/10\.48550/arXiv\.2112\.10741](https://doi.org/10.48550/arXiv.2112.10741)
- Rombach et al\. \[2022\]Rombach, R\., Blattmann, A\., Lorenz, D\., Esser, P\., Ommer, B\.: High\-resolution image synthesis with latent diffusion models\. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp\. 10684–10695 \(2022\)\.[https://doi\.org/10\.1109/CVPR52688\.2022\.01042](https://doi.org/10.1109/CVPR52688.2022.01042)
- Ho et al\. \[2022\]Ho, J\., Salimans, T\., Gritsenko, A\., Chan, W\., Norouzi, M\., Fleet, D\.J\.: Video diffusion models\. Advances in neural information processing systems35, 8633–8646 \(2022\)
- Jing et al\. \[2023\]Jing, B\., Erives, E\., Pao\-Huang, P\., Corso, G\., Berger, B\., Jaakkola, T\.: EigenFold: Generative protein structure prediction with diffusion models\. arXiv preprint arXiv:2304\.02198 \(2023\)[https://doi\.org/10\.48550/arXiv\.2304\.02198](https://doi.org/10.48550/arXiv.2304.02198)
- Wu et al\. \[2024\]Wu, K\.E\., Yang, K\.K\., Berg, R\., Alamdari, S\., Zou, J\.Y\., Lu, A\.X\., Amini, A\.P\.: Protein structure generation via folding diffusion\. Nature communications15\(1\), 1059 \(2024\)[https://doi\.org/10\.1038/s41467\-024\-45051\-2](https://doi.org/10.1038/s41467-024-45051-2)
- Schneuing et al\. \[2024\]Schneuing, A\., Harris, C\., Du, Y\., Didi, K\., Jamasb, A\., Igashov, I\., Du, W\., Gomes, C\., Blundell, T\.L\., Lio, P\.,et al\.: Structure\-based drug design with equivariant diffusion models\. Nature Computational Science4\(12\), 899–909 \(2024\)[https://doi\.org/10\.1038/s43588\-024\-00737\-x](https://doi.org/10.1038/s43588-024-00737-x)
- Janner et al\. \[2022\]Janner, M\., Du, Y\., Tenenbaum, J\.B\., Levine, S\.: Planning with diffusion for flexible behavior synthesis\. arXiv preprint arXiv:2205\.09991 \(2022\)[https://doi\.org/10\.48550/arXiv\.2205\.09991](https://doi.org/10.48550/arXiv.2205.09991)
- Ajay et al\. \[2022\]Ajay, A\., Du, Y\., Gupta, A\., Tenenbaum, J\., Jaakkola, T\., Agrawal, P\.: Is conditional generative modeling all you need for decision\-making? arXiv preprint arXiv:2211\.15657 \(2022\)[https://doi\.org/10\.48550/arXiv\.2211\.15657](https://doi.org/10.48550/arXiv.2211.15657)
- Chi et al\. \[2025\]Chi, C\., Xu, Z\., Feng, S\., Cousineau, E\., Du, Y\., Burchfiel, B\., Tedrake, R\., Song, S\.: Diffusion Policy: Visuomotor policy learning via action diffusion\. The International Journal of Robotics Research44\(10\-11\), 1684–1704 \(2025\)[https://doi\.org/10\.1177/02783649241273668](https://doi.org/10.1177/02783649241273668)
- Jiang et al\. \[2023\]Jiang, C\., Cornman, A\., Park, C\., Sapp, B\., Zhou, Y\., Anguelov, D\.,et al\.: MotionDiffuser: Controllable multi\-agent motion prediction using diffusion\. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp\. 9644–9653 \(2023\)\.[https://doi\.org/10\.1109/CVPR52729\.2023\.00930](https://doi.org/10.1109/CVPR52729.2023.00930)
- Zhang et al\. \[2024\]Zhang, Z\., Li, A\., Lim, A\., Chen, M\.: Predicting long\-term human behaviors in discrete representations via physics\-guided diffusion\. In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems \(IROS\), pp\. 11500–11507 \(2024\)\.[https://doi\.org/10\.1109/IROS58592\.2024\.10802068](https://doi.org/10.1109/IROS58592.2024.10802068)\. IEEE
- Li et al\. \[2025\]Li, A\., Bae, S\., Isele, D\., Beeson, R\., Tariq, F\.M\.: Predictive planner for autonomous driving with consistency models\. In: IEEE International Conference on Intelligent Transportation Systems \(ITSC\) \(2025\)\.[https://doi\.org/10\.48550/arXiv\.2502\.08033](https://doi.org/10.48550/arXiv.2502.08033)
- Ho and Salimans \[2022\]Ho, J\., Salimans, T\.: Classifier\-free diffusion guidance\. arXiv preprint arXiv:2207\.12598 \(2022\)[https://doi\.org/10\.48550/arXiv\.2207\.12598](https://doi.org/10.48550/arXiv.2207.12598)
- Dhariwal and Nichol \[2021\]Dhariwal, P\., Nichol, A\.: Diffusion models beat gans on image synthesis\. Advances in neural information processing systems34, 8780–8794 \(2021\)
- Ronneberger et al\. \[2015\]Ronneberger, O\., Fischer, P\., Brox, T\.: U\-Net: Convolutional networks for biomedical image segmentation\. In: International Conference on Medical Image Computing and Computer\-assisted Intervention, pp\. 234–241 \(2015\)\.[https://doi\.org/10\.1007/978\-3\-319\-24574\-4\_28](https://doi.org/10.1007/978-3-319-24574-4_28)\. Springer
- Vaswani et al\. \[2017\]Vaswani, A\., Shazeer, N\., Parmar, N\., Uszkoreit, J\., Jones, L\., Gomez, A\.N\., Kaiser, Ł\., Polosukhin, I\.: Attention is all you need\. Advances in neural information processing systems30\(2017\)
- Peebles and Xie \[2023\]Peebles, W\., Xie, S\.: Scalable diffusion models with transformers\. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp\. 4195–4205 \(2023\)\.[https://doi\.org/10\.1109/ICCV51070\.2023\.00387](https://doi.org/10.1109/ICCV51070.2023.00387)
- Kuhn and Tucker \[2014\]Kuhn, H\.W\., Tucker, A\.W\.: In: Giorgi, G\., Kjeldsen, T\.H\. \(eds\.\) Nonlinear Programming, pp\. 247–258\. Springer, Basel \(2014\)\.[https://doi\.org/10\.1007/978\-3\-0348\-0439\-4\_11](https://doi.org/10.1007/978-3-0348-0439-4_11)
- \(lucidrains\) \[2025\]\(lucidrains\), P\.W\.: denoising\-diffusion\-pytorch: Implementation of Denoising Diffusion Probabilistic Model in PyTorch\. GitHub\. Accessed: 2026\-01\-10 \(2025\)
- Himmelblau \[1972\]Himmelblau, D\.M\.: Applied Nonlinear Programming\. McGraw\-Hill, New York \(1972\)
- Virtanen et al\. \[2020\]Virtanen, P\., Gommers, R\., Oliphant, T\.E\., Haberland, M\., Reddy, T\., Cournapeau, D\., Burovski, E\., Peterson, P\., Weckesser, W\., Bright, J\., van der Walt, S\.J\., Brett, M\., Wilson, J\., Millman, K\.J\., Mayorov, N\., Nelson, A\.R\.J\., Jones, E\., Kern, R\., Larson, E\., Carey, C\.J\., Polat, İ\., Feng, Y\., Moore, E\.W\., VanderPlas, J\., Laxalde, D\., Perktold, J\., Cimrman, R\., Henriksen, I\., Quintero, E\.A\., Harris, C\.R\., Archibald, A\.M\., Ribeiro, A\.H\., Pedregosa, F\., van Mulbregt, P\., SciPy 1\.0 Contributors: SciPy 1\.0: Fundamental Algorithms for Scientific Computing in Python\. Nature Methods17, 261–272 \(2020\)[https://doi\.org/10\.1038/s41592\-019\-0686\-2](https://doi.org/10.1038/s41592-019-0686-2)
- Rosenbrock \[1960\]Rosenbrock, H\.H\.: An automatic method for finding the greatest or least value of a function\. The computer journal3\(3\), 175–184 \(1960\)[https://doi\.org/10\.1093/comjnl/3\.3\.175](https://doi.org/10.1093/comjnl/3.3.175)
- Dixon and Mills \[1994\]Dixon, L\., Mills, D\.: Effect of rounding errors on the variable metric method\. Journal of Optimization Theory and Applications80\(1\), 175–179 \(1994\)[https://doi\.org/10\.1007/BF02196600](https://doi.org/10.1007/BF02196600)
- Laguna and Marti \[2005\]Laguna, M\., Marti, R\.: Experimental testing of advanced scatter search designs for global optimization of multimodal functions\. Journal of Global Optimization33\(2\), 235–255 \(2005\)[https://doi\.org/10\.1007/s10898\-004\-1936\-z](https://doi.org/10.1007/s10898-004-1936-z)
- Wu et al\. \[2020\]Wu, E\., Kenway, G\., Mader, C\.A\., Jasa, J\., Martins, J\.R\.R\.A\.: pyOptSparse: A python framework for large\-scale constrained nonlinear optimization of sparse systems\. Journal of Open Source Software5\(54\), 2564 \(2020\)[https://doi\.org/10\.21105/joss\.02564](https://doi.org/10.21105/joss.02564)

Similar Articles

Latent Heuristic Search: Continuous Optimization for Automated Algorithm Design

arXiv cs.AI

This paper proposes Latent Heuristic Search (LHS), a framework that shifts heuristic discovery to a learned continuous latent manifold, using gradient-based optimization and normalizing flows to generate novel heuristics conditioned on large language models, achieving competitive results on TSP, CVRP, KSP, and Online Bin Packing.