SMCEvolve: Principled Scientific Discovery via Sequential Monte Carlo Evolution

arXiv cs.AI 05/18/26, 04:00 AM Papers
Summary
SMCEvolve introduces a principled framework for LLM-driven program evolution by reformulating it as sampling from a reward-tilted distribution using Sequential Monte Carlo. It provides convergence guarantees and outperforms existing methods across multiple scientific discovery benchmarks.
arXiv:2605.15308v1 Announce Type: new Abstract: LLM-driven program evolution has emerged as a powerful tool for automated scientific discovery, yet existing frameworks offer no principled guide for designing their individual components and provide no guarantee that the search converges. We introduce SMCEvolve, which recasts program search as sampling from a reward-tilted target distribution and approximates it with a Sequential Monte Carlo (SMC) sampler. From this view, three core mechanisms emerge as principled components: adaptive parent resampling, mixture of mutation with acceptance, and automatic convergence control. We further provide a finite-sample complexity analysis that bounds the LLM-call budget required to reach a target approximation error. Across math, algorithm efficiency, symbolic regression, and end-to-end ML research benchmarks, SMCEvolve surpasses state-of-the-art evolving systems while using fewer LLM calls under self-determined termination. The code is available at https://github.com/kongwanbianjinyu/SMCEvolve.
Original Article
View Cached Full Text
Cached at: 05/18/26, 06:31 AM
# SMCEvolve: Principled Scientific Discovery via Sequential Monte Carlo Evolution
Source: [https://arxiv.org/html/2605.15308](https://arxiv.org/html/2605.15308)
Jiachen Jiang , Huminhao Zhu∗, Zhihui Zhu† Department of Computer Science and Engineering The Ohio State University \{jiang\.2880, zhu\.4228, zhu\.3440\}@osu\.edu

###### Abstract

LLM\-driven program evolution has emerged as a powerful tool for automated scientific discovery, yet existing frameworks offer no principled guide for designing their individual components and provide no guarantee that the search converges\. We introduceSMCEvolve, which recasts program search as sampling from a reward\-tilted target distribution and approximates it with a Sequential Monte Carlo \(SMC\) sampler\. From this view, three core mechanisms emerge as principled components: adaptive parent resampling, mixture of mutation with acceptance, and automatic convergence control\. We further provide a finite\-sample complexity analysis that bounds the LLM\-call budget required to reach a target approximation error\. Across math, algorithm efficiency, symbolic regression, and end\-to\-end ML research benchmarks,SMCEvolvesurpasses state\-of\-the\-art evolving systems while using fewer LLM calls under self\-determined termination\. The code is available at[https://github\.com/kongwanbianjinyu/SMCEvolve](https://github.com/kongwanbianjinyu/SMCEvolve)\.

![Refer to caption](https://arxiv.org/html/2605.15308v1/figs/SMCEvolve_teaser.png)Figure 1:LLM\-driven program evolution targeting the reward\-tilted distribution via Sequential Monte Carlo \(SMC\)\.The contour depicts the distribution over the program space; each white dot is a particle \(program\) in the evolving population, and the dashed circle marks the high\-reward region\. Starting from particles drawn from the LLM priorp0p\_\{0\}\(*left*\), where the high\-reward region is rarely covered, the population is driven by the LLM toward new programs that form a sequence of bridging distributionsp0eβtR\(x\)p\_\{0\}\\,e^\{\\beta\_\{t\}R\(x\)\}with increasing reward intensityβt\\beta\_\{t\}, and ultimately concentrates on the targetp0eβTR\(x\)p\_\{0\}\\,e^\{\\beta\_\{T\}R\(x\)\}\(*right*\)\. This yields a principled evolutionary frameworkSMCEvolvewith convergence guarantees, in contrast to ad\-hoc pipelines\.## 1Introduction

LLM\-driven agents have enabled automated scientific discovery at scale: given the problem description and evaluator, the agent can autonomously search for programs that optimize a target objective, iteratively refining candidates based on evaluation feedback\[[33](https://arxiv.org/html/2605.15308#bib.bib6),[19](https://arxiv.org/html/2605.15308#bib.bib7),[26](https://arxiv.org/html/2605.15308#bib.bib10),[24](https://arxiv.org/html/2605.15308#bib.bib9),[17](https://arxiv.org/html/2605.15308#bib.bib8)\]\. This paradigm has rapidly expanded across diverse domains, including mathematical discovery\[[11](https://arxiv.org/html/2605.15308#bib.bib17)\], symbolic regression\[[29](https://arxiv.org/html/2605.15308#bib.bib11)\], materials design\[[1](https://arxiv.org/html/2605.15308#bib.bib12)\], machine learning\[[4](https://arxiv.org/html/2605.15308#bib.bib15),[12](https://arxiv.org/html/2605.15308#bib.bib14)\]and end\-to\-end research automation\[[31](https://arxiv.org/html/2605.15308#bib.bib13)\]\.

Despite these successes, current agent design remains largely*empirical*and the area lacks a unifying*theoretical*framework that guides the design\. A typical pipeline, exemplified by AlphaEvolve\[[24](https://arxiv.org/html/2605.15308#bib.bib9)\], ShinkaEvolve\[[17](https://arxiv.org/html/2605.15308#bib.bib8)\], and others\[[33](https://arxiv.org/html/2605.15308#bib.bib6),[19](https://arxiv.org/html/2605.15308#bib.bib7),[13](https://arxiv.org/html/2605.15308#bib.bib1),[3](https://arxiv.org/html/2605.15308#bib.bib33),[32](https://arxiv.org/html/2605.15308#bib.bib34),[20](https://arxiv.org/html/2605.15308#bib.bib35)\], maintains a population of candidate programs, prompts an LLM to mutate a selected parent based on context, and adds the evaluated offspring back into the population\. The absence of theoretical guidance leaves several limitations\. First, there is no principled guide for designing each component within the pipeline: components such as parent selection, context construction, and population management are hand\-crafted in isolation and validated through extensive ablations rather than derived from a coherent principle that connects them\. Second, the convergence of the pipeline is not guaranteed, leading to substantial randomness and high variance across runs; moreover, existing systems cannot distinguish whether the population has converged to a high\-reward region or merely stalled at a local mode , forcing the number of iterations to be fixed in advance—either too small to reach a good solution, or too large and wasting compute\. These observations motivate the central question of this work:

*Can we ground LLM\-driven program evolution in a rigorous probabilistic framework that unifies its components and provides convergence guarantees?*

We address this question from a simple observation: searching for high\-reward programs with an LLM can be cast as*sampling from a reward\-tilted target distribution*p∗\(x∣q\)∝p0\(x∣q\)eβR\(x\)p^\{\*\}\(x\\mid q\)\\propto p\_\{0\}\(x\\mid q\)\\,e^\{\\beta R\(x\)\}, which reweights the LLM priorp0p\_\{0\}over programsxxfor a given problemqqtoward those with high evaluation rewardRR\. Since directly sampling fromp∗p^\{\*\}is intractable, we turn to*Sequential Monte Carlo \(SMC\) samplers*\[[7](https://arxiv.org/html/2605.15308#bib.bib18),[6](https://arxiv.org/html/2605.15308#bib.bib19)\]which progressively transport particles from the priorp0p\_\{0\}toward the targetp∗p^\{\*\}through a sequence of bridging distributions with increasing reward weight\. Treating each candidate program as a particle, we deriveSMCEvolve, a complete evolutionary framework whose three core mechanisms emerge directly from SMC theory:

- •Adaptive parent resampling\.Parents are resampled by importance weights derived from the current bridging distribution, where the influence of reward on selection strengthens as the search progresses—from near\-uniform sampling that encourages exploration in early iterations, to sharply reward\-focused sampling that drives exploitation in later iterations\. This stands in contrast to existing systems, which rely on a fixed selection rule throughout the search\.
- •Mixture of mutation with acceptance\.To ensure convergence, the LLM generates multiple proposals at each step, each accepted or rejected with a probability related to the current bridging distribution\. To accelerate convergence, we further design a mixture of LLM\-driven kernels along two axes—edit scope \(local diff vs\. full rewrite\) and information source \(single\-parent vs\. cross\-program inspiration\)—and adaptively select among them via Thompson sampling\[[28](https://arxiv.org/html/2605.15308#bib.bib20)\], which favors kernels that have recently produced high\-reward proposals\. In contrast, existing systems accept the proposal unconditionally and rely on either a single mutation strategy or a fixed\-weight combination of strategies\.
- •Automatic convergence control\.The number of iterations is determined adaptively from the state of the population rather than fixed in advance: the search proceeds in small steps when the population is still spread across diverse candidates and in larger steps once it concentrates on high\-reward programs, and terminates automatically once it reaches the target distribution\. In contrast, existing systems run for a fixed number of iterations specified in advance, without a signal to detect convergence or stalling\.

Together, these three mechanisms instantiate a complete SMC sampler for LLM\-driven program search, and the resulting framework admits formal convergence analysis with finite\-sample complexity guarantee\. Our main contributions are as follows:

- •A principled framework for evolutionary program search\.We introduceSMCEvolve, the first evolving agent whose components—parent resampling, mutation, and convergence control—are derived from SMC theory rather than hand\-crafted in isolation\.
- •Finite\-sample convergence analysis\.We provide the first convergence guarantee for program evolution, showing that achieving a target approximation error requires a total computational cost bounded explicitly in terms of particle budget, mutation mixing, and reward intensity\.
- •Empirical validation across diverse domains\.We evaluateSMCEvolveon multiple domains spanning math\[[24](https://arxiv.org/html/2605.15308#bib.bib9)\], algorithm efficiency\[[25](https://arxiv.org/html/2605.15308#bib.bib26)\], symbolic regression\[[30](https://arxiv.org/html/2605.15308#bib.bib16)\], and LLM pretraining\[[14](https://arxiv.org/html/2605.15308#bib.bib4)\], where it surpasses state\-of\-the\-art evolving systems on the majority of tasks while consuming fewer LLM calls under automatic convergence control\.

![Refer to caption](https://arxiv.org/html/2605.15308v1/figs/main_SMCEvolve.png)Figure 2:Overview ofSMCEvolve\.*Left:*the main loop\. Starting fromNNinitial programs, each ofTTiterations*\(i\)*reweights and resamples particles,*\(ii\)*mutates each parent through aKK\-step MH chain of LLM proposals and accept/reject updates, and*\(iii\)*schedules the nextλt\\lambda\_\{t\}via ESS, terminating whenλt=1\\lambda\_\{t\}=1\.*Right, top to bottom:*the evolving process on circle packing \(best\-so\-far reward across iterations\); the mixture of four mutation kernels selected by Thompson sampling; and the automatic convergence control findingλt\\lambda\_\{t\}at the intersection ofESS\(λ\)\\mathrm\{ESS\}\(\\lambda\)with the thresholdκN\\kappa N\.
## 2Method: Program Evolution as Sequential Monte Carlo

We formulate program search as sampling from a reward\-tilted target distribution that balances reward maximization against the LLM prior \([Section˜2\.1](https://arxiv.org/html/2605.15308#S2.SS1)\), and approximate this intractable target with a Sequential Monte Carlo sampler whose components – intermediate distributions, importance weights, mutation kernel, and adaptive scheduling – are derived from SMC theory \([Section˜2\.2](https://arxiv.org/html/2605.15308#S2.SS2)\)\.

#### Notation\.

Throughout, let𝒳\\mathcal\{X\}denote the countable program space for a taskqq\. All distributions over𝒳\\mathcal\{X\}are probability mass functions, and integrals∫𝒳f\(x\)𝑑x\\int\_\{\\mathcal\{X\}\}f\(x\)\\,dxdenote sums over programs\. Given an LLM, letpLLM\(⋅∣𝒞\)p\_\{\\mathrm\{LLM\}\}\(\\cdot\\mid\\mathcal\{C\}\)denote the distribution induced by sampling from the LLM under context𝒞\\mathcal\{C\}\. We definep0\(⋅∣q\)=pLLM\(⋅∣q\)p\_\{0\}\(\\cdot\\mid q\)=p\_\{\\mathrm\{LLM\}\}\(\\cdot\\mid q\)as the LLM prior for taskqq, andQt\(⋅∣x,𝒞t\)=pLLM\(⋅∣x,𝒞t\)Q\_\{t\}\(\\cdot\\mid x,\\mathcal\{C\}\_\{t\}\)=p\_\{\\mathrm\{LLM\}\}\(\\cdot\\mid x,\\mathcal\{C\}\_\{t\}\)as the stage\-conditional LLM proposal distribution used for mutation, conditioned on the current programxxand context𝒞t\\mathcal\{C\}\_\{t\}\. LetR:𝒳→ℝR:\\mathcal\{X\}\\to\\mathbb\{R\}be the reward function, assumed bounded onsupp\(p0\)\\mathrm\{supp\}\(p\_\{0\}\)\. WriteR−=infsupp\(p0\)RR\_\{\-\}=\\inf\_\{\\mathrm\{supp\}\(p\_\{0\}\)\}R,R\+=supsupp\(p0\)RR\_\{\+\}=\\sup\_\{\\mathrm\{supp\}\(p\_\{0\}\)\}R, andΔR=R\+−R−<∞\\Delta\_\{R\}=R\_\{\+\}\-R\_\{\-\}<\\infty\.

### 2\.1Reward\-Tilted Target Distribution

Across various domains including mathematical discovery\[[11](https://arxiv.org/html/2605.15308#bib.bib17)\], symbolic regression\[[29](https://arxiv.org/html/2605.15308#bib.bib11)\], materials design\[[1](https://arxiv.org/html/2605.15308#bib.bib12)\], scientific discovery shares a common structure: a search problem in program space𝒳\\mathcal\{X\}guided by an evaluator, where the goal is to find the program that maximize the rewardR\(x\)R\(x\)\. However, the program space is discrete and large, making direct maximization intractable\. We therefore leverage an LLM priorp0p\_\{0\}to guide the search toward plausible programs\. To balance reward maximization with adherence to the prior, we formulate LLM\-driven program search as the following KL\-regularized variational optimization problem:

maxp⁡𝔼x∼p⁡\[R\(x\)\]−1βDKL\(p∥p0\),\\max\_\{p\}\\;\\operatorname\{\\mathbb\{E\}\}\_\{x\\sim p\}\\\!\\big\[R\(x\)\\big\]\-\\frac\{1\}\{\\beta\}\\,D\_\{\\mathrm\{KL\}\}\\\!\\big\(p\\,\\\|\\,p\_\{0\}\\big\),\(1\)where the maximization is over all probability mass functions on𝒳\\mathcal\{X\}that are absolutely continuous with respect top0p\_\{0\}\. The following result characterize the optimal solution\.

###### Lemma 2\.1\(Reward\-tilted target distribution\)\.

The unique maximizer of \([1](https://arxiv.org/html/2605.15308#S2.E1)\) is

p∗\(x∣q\)≜1Z\(q\)p0\(x∣q\)eβR\(x\),p^\{\*\}\(x\\mid q\)\\;\\triangleq\\;\\frac\{1\}\{Z\(q\)\}\\,p\_\{0\}\(x\\mid q\)\\,e^\{\\beta R\(x\)\},\(2\)whereZ\(q\)=∫p0\(x′∣q\)eβR\(x′\)𝑑x′Z\(q\)=\\int p\_\{0\}\(x^\{\\prime\}\\mid q\)\\,e^\{\\beta R\(x^\{\\prime\}\)\}\\,dx^\{\\prime\}is the partition function\.

The proof follows from standard calculus of variations: setting the functional derivative of the Lagrangian \(with a normalization constraint\) to zero yields \([2](https://arxiv.org/html/2605.15308#S2.E2)\) directly, see Appendix[B](https://arxiv.org/html/2605.15308#A2)for details\. The bounded\-oscillation assumption ensures thateβR−≤Z\(q\)≤eβR\+<∞e^\{\\beta R\_\{\-\}\}\\leq Z\(q\)\\leq e^\{\\beta R\_\{\+\}\}<\\infty, so \([2](https://arxiv.org/html/2605.15308#S2.E2)\) is a well\-defined probability mass function on𝒳\\mathcal\{X\}that is absolutely continuous with respect top0p\_\{0\}\.

The parameterβ\\betainterpolates between the LLM prior \(β→0\\beta\\to 0\) and pure reward maximization \(β→∞\\beta\\to\\infty\)\. Sampling fromp∗p^\{\*\}is intractable, as the normalization termZ\(q\)Z\(q\)requires summing over all programs in an extreme large program space\.

### 2\.2Sequential Monte Carlo Samplers

To sample from the intractable targetp∗\(⋅∣q\)p^\{\*\}\(\\cdot\\mid q\), we turn to Sequential Monte Carlo \(SMC\) method\[[7](https://arxiv.org/html/2605.15308#bib.bib18)\]\. It bridgesp0p\_\{0\}top∗p^\{\*\}through a sequence of*bridging distributions*p0,p1,…,pT=p∗p\_\{0\},p\_\{1\},\\ldots,p\_\{T\}=p^\{\*\}, withptp\_\{t\}written aspt\(x\)=γt\(x\)/Ztp\_\{t\}\(x\)=\\gamma\_\{t\}\(x\)/Z\_\{t\}for an unnormalized densityγt\\gamma\_\{t\}and normalizing constantZtZ\_\{t\}\. The sampler maintainsNNparticles\{xt−1\(n\)\}n=1N\\\{x\_\{t\-1\}^\{\(n\)\}\\\}\_\{n=1\}^\{N\}approximatingpt−1p\_\{t\-1\}, and advances them as samples fromptp\_\{t\}by repeating three operations:*reweighting*each particle by an importance weight,*resampling*them to discard low\-weight particles, and*mutating*each particle to inject diversity\. The reweighting step assigns the importance weight as\[[6](https://arxiv.org/html/2605.15308#bib.bib19),[7](https://arxiv.org/html/2605.15308#bib.bib18)\]

wt\(xt−1,xt\)=γt\(xt\)Lt−1\(xt,xt−1\)γt−1\(xt−1\)MtK\(xt−1,xt\),w\_\{t\}\(x\_\{t\-1\},x\_\{t\}\)\\;=\\;\\frac\{\\gamma\_\{t\}\(x\_\{t\}\)\\,L\_\{t\-1\}\(x\_\{t\},x\_\{t\-1\}\)\}\{\\gamma\_\{t\-1\}\(x\_\{t\-1\}\)\\,M\_\{t\}^\{K\}\(x\_\{t\-1\},x\_\{t\}\)\},\(3\)where\(xt−1,xt\)=\(xt−1\(n\),xt\(n\)\)\(x\_\{t\-1\},x\_\{t\}\)=\(x\_\{t\-1\}^\{\(n\)\},x\_\{t\}^\{\(n\)\}\)for each particlenn,MtK\(xt−1,⋅\)M\_\{t\}^\{K\}\(x\_\{t\-1\},\\cdot\)is the*stage forward kernel*—theKK\-fold composition of a one\-step kernelMtM\_\{t\}used to mutate each particle from staget−1t\-1to stagett—andLt−1\(xt,⋅\)L\_\{t\-1\}\(x\_\{t\},\\cdot\)is an auxiliary*backward kernel*introduced by the SMC framework to make \([3](https://arxiv.org/html/2605.15308#S2.E3)\) tractable\. Crucially, the effectiveness of SMC hinges on how these components are*jointly designed for the target problem*: the bridging distributions\{pt\}\\\{p\_\{t\}\\\}determine the annealing path,MtM\_\{t\}controls exploration and proposal quality, and the backward kernelLt−1L\_\{t\-1\}stabilizes importance weighting\.

#### Kernel properties required by SMC\.

A forward kernelMtM\_\{t\}acts on distributions in two cases: starting fromptp\_\{t\}, applyingMtM\_\{t\}should keep the distribution atptp\_\{t\}\(*ptp\_\{t\}\-invariance*\); starting away fromptp\_\{t\}, iteratingMtM\_\{t\}should drive the distribution towardptp\_\{t\}\(*ergodicity*\)\. Invariance ensures that the move step, applied to a particle already distributed asptp\_\{t\}, does not drift the distribution away fromptp\_\{t\}\. But invariance alone is not enough for finite\-budget search\. The identity kernel, for instance, is perfectlyptp\_\{t\}\-invariant but simply copies particles and provides no exploration after resampling\. Uniform ergodicity rules out such ineffective kernels by giving a quantitative total\-variation bound afterKKmutation steps; smallerρ\\rhocorresponds to faster mixing\.

###### Definition 2\.2\(ptp\_\{t\}\-invariance\)\.

A Markov kernelMtM\_\{t\}on𝒳\\mathcal\{X\}is*ptp\_\{t\}\-invariant*if applyingMtM\_\{t\}to a state distributed asptp\_\{t\}leaves the distribution unchanged:

∫𝒳pt\(x\)Mt\(x,y\)𝑑x=pt\(y\),∀y∈𝒳\.\\int\_\{\\mathcal\{X\}\}p\_\{t\}\(x\)\\,M\_\{t\}\(x,y\)\\,dx\\;=\\;p\_\{t\}\(y\),\\qquad\\forall\\,y\\in\\mathcal\{X\}\.\(4\)

###### Definition 2\.3\(Uniform ergodicity;[22](https://arxiv.org/html/2605.15308#bib.bib3), Ch\. 16\)\.

A Markov kernelMtM\_\{t\}with invariant distributionptp\_\{t\}is*uniformly ergodic*if there exist constantsC≥1C\\geq 1andρ∈\(0,1\)\\rho\\in\(0,1\)such that, for everyx∈𝒳x\\in\\mathcal\{X\}and everyK≥1K\\geq 1,

‖MtK\(x,⋅\)−pt‖TV≤C⋅ρK,\\big\\\|M\_\{t\}^\{K\}\(x,\\cdot\)\-p\_\{t\}\\big\\\|\_\{\\mathrm\{TV\}\}\\;\\leq\\;C\\cdot\\rho^\{K\},\(5\)where‖μ−ν‖TV=supA⊆𝒳\|μ\(A\)−ν\(A\)\|\\\|\\mu\-\\nu\\\|\_\{\\mathrm\{TV\}\}=\\sup\_\{A\\subseteq\\mathcal\{X\}\}\|\\mu\(A\)\-\\nu\(A\)\|is the total variation distance\.

### 2\.3SMCEvolve: LLM\-driven program evolution via SMC

Building on this perspective, we instantiate SMC for program evolution by designing problem\-specific components tailored to LLM\-based search\. Our method,SMCEvolve, integrates \(i\)*adaptive parent resampling*aligned with reward shaping, \(ii\) a*mixture of mutation kernels with acceptance correction*to balance exploration and validity, and \(iii\)*automatic convergence control*via adaptive annealing\. These designs are summarized in[Figure˜2](https://arxiv.org/html/2605.15308#S1.F2), with a case study on circle packing shown in[Figure˜3](https://arxiv.org/html/2605.15308#S2.F3)\.

![Refer to caption](https://arxiv.org/html/2605.15308v1/figs/adaptive_parent_sampling.png)\(a\)Parent resampling
![Refer to caption](https://arxiv.org/html/2605.15308v1/figs/kernel_combined.png)\(b\)Kernel selection & acceptance
![Refer to caption](https://arxiv.org/html/2605.15308v1/figs/lambda_schedule_v2_combined.png)\(c\)ESS\\mathrm\{ESS\}curves &λt\\lambda\_\{t\}schedule

Figure 3:Case study ofSMCEvolveon circle packing, illustrating the three mechanisms\.\(a\)Resampling probability of each particle \(ranked by reward\) across iterations\.\(b\)Per\-iteration selection counts and cumulative MH acceptance rates of the four mutation kernelsM\(1\),…,M\(4\)M^\{\(1\)\},\\dots,M^\{\(4\)\}\.\(c\)The ESS curvesESS\(λ\)\\mathrm\{ESS\}\(\\lambda\)at each iteration \(left\) and the resulting scheduleλt\\lambda\_\{t\}\(right\); eachλt\\lambda\_\{t\}is set whereESS\(λ\)\\mathrm\{ESS\}\(\\lambda\)crosses the thresholdκN\\kappa N\(green dashed\)\.#### Adaptive parent resampling\.

We construct the bridging distributions\{pt\}t=0T\\\{p\_\{t\}\\\}\_\{t=0\}^\{T\}as a geometric annealing path between the LLM priorp0\(⋅∣q\)p\_\{0\}\(\\cdot\\mid q\)and the targetp∗\(⋅∣q\)p^\{\*\}\(\\cdot\\mid q\)\. At iterationtt, we set the unnormalized density to,

γt\(x\)=p0\(x∣q\)exp⁡\(βtR\(x\)\),βt=λtβ,\\gamma\_\{t\}\(x\)\\;=\\;p\_\{0\}\(x\\mid q\)\\,\\exp\\\!\\big\(\\beta\_\{t\}R\(x\)\\big\),\\qquad\\beta\_\{t\}\\;=\\;\\lambda\_\{t\}\\,\\beta,\(6\)where the temperature schedule0=λ0<λ1<⋯<λT=10=\\lambda\_\{0\}<\\lambda\_\{1\}<\\cdots<\\lambda\_\{T\}=1interpolates the reward intensity from zero to its target valueβ\\beta\. The endpoints recoverp0\(⋅∣q\)p\_\{0\}\(\\cdot\\mid q\)att=0t=0andp∗\(⋅∣q\)p^\{\*\}\(\\cdot\\mid q\)att=Tt=Tfrom \([2](https://arxiv.org/html/2605.15308#S2.E2)\)\.

Next, we specialize the backward kernelLt−1L\_\{t\-1\}as the time reversal\[[6](https://arxiv.org/html/2605.15308#bib.bib19)\]ofMtKM\_\{t\}^\{K\}underptp\_\{t\},

Lt−1\(xt,xt−1\)=pt\(xt−1\)MtK\(xt−1,xt\)pt\(xt\)\.L\_\{t\-1\}\(x\_\{t\},x\_\{t\-1\}\)\\;=\\;\\frac\{p\_\{t\}\(x\_\{t\-1\}\)\\,M\_\{t\}^\{K\}\(x\_\{t\-1\},x\_\{t\}\)\}\{p\_\{t\}\(x\_\{t\}\)\}\.\(7\)Substituting \([7](https://arxiv.org/html/2605.15308#S2.E7)\) into \([3](https://arxiv.org/html/2605.15308#S2.E3)\), the forward and backward kernels cancel, leaving a weight that is independent ofxtx\_\{t\}and depends only on the parent’s reward \(full derivation in Appendix[D](https://arxiv.org/html/2605.15308#A4)\):

wt\(xt−1,xt\)=γt\(xt−1\)γt−1\(xt−1\)=exp⁡\(\(βt−βt−1\)R\(xt−1\)\)\.w\_\{t\}\(x\_\{t\-1\},x\_\{t\}\)\\;=\\;\\frac\{\\gamma\_\{t\}\(x\_\{t\-1\}\)\}\{\\gamma\_\{t\-1\}\(x\_\{t\-1\}\)\}\\;=\\;\\exp\\\!\\big\(\(\\beta\_\{t\}\-\\beta\_\{t\-1\}\)\\,R\(x\_\{t\-1\}\)\\big\)\.\(8\)Normalizing across theNNparticles, the resampling probability of particlennis the temperature\-based softmax over particle rewards,

Wt\(n\)=exp⁡\(\(βt−βt−1\)R\(xt−1\(n\)\)\)∑m=1Nexp⁡\(\(βt−βt−1\)R\(xt−1\(m\)\)\),W\_\{t\}^\{\(n\)\}\\;=\\;\\frac\{\\exp\\\!\\big\(\(\\beta\_\{t\}\-\\beta\_\{t\-1\}\)\\,R\(x^\{\(n\)\}\_\{t\-1\}\)\\big\)\}\{\\sum\_\{m=1\}^\{N\}\\exp\\\!\\big\(\(\\beta\_\{t\}\-\\beta\_\{t\-1\}\)\\,R\(x^\{\(m\)\}\_\{t\-1\}\)\\big\)\},\(9\)whereβt−βt−1\>0\\beta\_\{t\}\-\\beta\_\{t\-1\}\>0acts as inverse temperature: small increments flatten the softmax toward uniform sampling, while large increments sharpen it onto top\-reward particles\.

Case study\.[Figure˜3\(a\)](https://arxiv.org/html/2605.15308#S2.F3.sf1)illustrates this transition on circle packing\. In early iterations the incrementβt−βt−1\\beta\_\{t\}\-\\beta\_\{t\-1\}is small, so reward exerts little influence on the resampling weight and sampling probabilities are nearly uniform across particles; in later iterations the increment grows, and the weight concentrates resampling on high\-reward parents\.

#### Design of the forward kernel\.

Leveraging the in\-context learning capability of LLMs, we design the one\-step forward/mutation kernelMtM\_\{t\}around an LLM\-based proposalx′∼Qt\(⋅∣x,𝒞t\)x^\{\\prime\}\\sim Q\_\{t\}\(\\cdot\\mid x,\\mathcal\{C\}\_\{t\}\)\. However, the LLM proposal kernelQtQ\_\{t\}alone may not satisfy the two properties required for SMC convergence:ptp\_\{t\}\-invariance \([Section˜2\.2](https://arxiv.org/html/2605.15308#S2.SS2.SSS0.Px1)\) and uniform ergodicity \([Section˜2\.2](https://arxiv.org/html/2605.15308#S2.SS2.SSS0.Px1)\)\. To address this challenge, we augment the proposal mechanism with a mixture of mutation kernels and a Metropolis–Hastings \(MH\) accept/reject correction applied to multiple candidate samples\. Specifically, each parent particle is propagated by repeatedly applying a one\-step mutation kernel forKKiterations, yielding the resultingKK\-step mutation blockMtKM\_\{t\}^\{K\}\.

MH accept/reject forptp\_\{t\}\-invariance\.Invariance ensures that the mutation step does not drift the particle population away from the current bridgeptp\_\{t\}\. We achieve this via the MH construction \(proof in Appendix[C](https://arxiv.org/html/2605.15308#A3)\): each candidatex′x^\{\\prime\}proposed by the LLM is accepted with probabilityαt\(x,x′\)\\alpha\_\{t\}\(x,x^\{\\prime\}\), and otherwise the parentxxis kept\. Iterating this proposal\-and\-acceptance stepKKtimes per particle yields aKK\-step MH chain, whose final state is taken as the mutated particle\. The full MH ratio requires evaluating the LLM density in both forward and reverse directions, which is inaccessible through black\-box APIs\. We therefore adopt the reward\-only criterion as an approximation:

αt\(x,x′\)=min⁡\{1,eβt\(R\(x′\)−R\(x\)\)\}\.\\alpha\_\{t\}\(x,x^\{\\prime\}\)=\\min\\\!\\big\\\{1,\\;e^\{\\beta\_\{t\}\(R\(x^\{\\prime\}\)\-R\(x\)\)\}\\big\\\}\.\(10\)The deviation is discussed in Appendix[E](https://arxiv.org/html/2605.15308#A5)\. Intuitively, this rule always accepts reward\-improving proposals and accepts reward\-decreasing ones with probability that decays exponentially in the reward gap\. Under the ideal MH kernel, largerKKgives the chain more time to mix towardptp\_\{t\}at the cost of more LLM calls\.

LLM proposal with context for ergodicity\.Ergodicity governs how fast newly generated programs spread across the program space, and we accelerate it through the design of the LLM context𝒞t\\mathcal\{C\}\_\{t\}\. Following the mixture\-of\-kernels construction\[[7](https://arxiv.org/html/2605.15308#bib.bib18), Sec\. 3\.2\.2\], we instantiateQtQ\_\{t\}as a mixture of four LLM proposal modes; composed with MH accept/reject, these modes induce the four one\-step mutation kernels shown below\.

Diff kernels produce smallSEARCH/REPLACEedits, while rewrite kernels generate the entire program from scratch\. Inspiration programs include the top\-kkprograms by reward andmmmaximally diverse ones, selected by picking programs whose text embeddings are farthest from the parent across the full evaluation history\. Crucially, this history contains*every*program ever proposed by the LLM, including those rejected by the MH step, so that information from rejected proposals is not discarded but as inspiration for future mutations\.

As the most useful kernel changes over the process of the search, we adaptively select among the four kernels via Thompson sampling\[[28](https://arxiv.org/html/2605.15308#bib.bib20)\]: each kernel maintains a Beta posterior over its success rate, and at each step we sample from every posterior and pick the kernels that have recently produced high\-reward proposals\. Together, these four kernels cover both local and global moves and both single\-parent and cross\-program information, broadening the reachable set from any given particle and accelerating mixing\.

Case study\.[Figure˜3\(b\)](https://arxiv.org/html/2605.15308#S2.F3.sf2)confirms this behavior on circle packing\. The inspiration\-based kernelsM\(1\)M^\{\(1\)\}andM\(3\)M^\{\(3\)\}attain substantially higher MH acceptance under \([10](https://arxiv.org/html/2605.15308#S2.E10)\) than their no\-inspiration counterpartsM\(2\)M^\{\(2\)\}andM\(4\)M^\{\(4\)\}, and Thompson sampling progressively selects them more often than the no\-inspiration ones\.

Automatic convergence control\.We now describe an adaptive strategy for controlling the evolution iterations\. Towards that goal, rather than pre\-specifying the inverse temperature sequence\(λt\)t=0T\(\\lambda\_\{t\}\)\_\{t=0\}^\{T\}, we iteratively determineλt\\lambda\_\{t\}adaptively using the effective sample size \(ESS\)\[[7](https://arxiv.org/html/2605.15308#bib.bib18)\]defined as

ESS\(λ\)=\(∑n=1Nu\(n\)\(λ\)\)2∑n=1N\(u\(n\)\(λ\)\)2,u\(n\)\(λ\)=e\(λ−λt−1\)βR\(xt−1\(n\)\)\.\\text\{ESS\}\(\\lambda\)=\\frac\{\\bigl\(\\sum\_\{n=1\}^\{N\}u^\{\(n\)\}\(\\lambda\)\\bigr\)^\{2\}\}\{\\sum\_\{n=1\}^\{N\}\\bigl\(u^\{\(n\)\}\(\\lambda\)\\bigr\)^\{2\}\},\\qquad u^\{\(n\)\}\(\\lambda\)=e^\{\(\\lambda\-\\lambda\_\{t\-1\}\)\\beta R\(x\_\{t\-1\}^\{\(n\)\}\)\}\.\(11\)Intuitively,ESS\(λ\)\\text\{ESS\}\(\\lambda\)measures how many particles effectively contribute after reweighting—close toNNwhen weights are balanced, and dropping toward11as they concentrate on a few high\-reward particles\. Since increasing the temperatureλ\\lambdasharpens the reward tilt and decreasesESS\(λ\)\\text\{ESS\}\(\\lambda\), we setλt\\lambda\_\{t\}to be the largestλ\\lambdawhose reweighting still keeps a target fractionκ∈\(0,1\)\\kappa\\in\(0,1\)of particles effective\. This can be computed via bisection onESS\(λt\)=κN\\text\{ESS\}\(\\lambda\_\{t\}\)=\\kappa Nstarting fromλt−1\\lambda\_\{t\-1\}, and iterate untilλt=1\\lambda\_\{t\}=1\(set directly whenESS\(1\)≥κN\\text\{ESS\}\(1\)\\geq\\kappa N\)\. The total number of iterationsTTis therefore determined automatically and adapts to the difficulty of the task\.

To illustrate,[Figure˜3\(c\)](https://arxiv.org/html/2605.15308#S2.F3.sf3)shows how the schedule is built on circle packing\. At each iteration we compute the curveESS\(λ\)\\mathrm\{ESS\}\(\\lambda\)as a function ofλ\\lambda\(left panel\) and get the nextλt\\lambda\_\{t\}at its intersection with the thresholdκN\\kappa N\(green dashed line\)\. The resulting schedule \(right panel\) grows slowly at first—small steps because the early ESS curves decay sharply—and accelerates as the curves flatten in later iterations, reachingλt=1\\lambda\_\{t\}=1at automatic termination\.

#### The complete algorithm\.

[Algorithm˜1](https://arxiv.org/html/2605.15308#alg1)assembles the three designs into a single loop: at each iterationtt, we \(i\) reweight and resample particles via the temperature\-based softmax \([9](https://arxiv.org/html/2605.15308#S2.E9)\), \(ii\) mutate each parent through aKK\-step MH chain with the Thompson\-sampled mixture of LLM proposal modes, and \(iii\) advanceλt\\lambda\_\{t\}by ESS\-based bisection \([11](https://arxiv.org/html/2605.15308#S2.E11)\), terminating onceλt=1\\lambda\_\{t\}=1\. The total cost isN⋅K⋅TN\\cdot K\\cdot TLLM calls, whereTTis determined adaptively rather than fixed in advance\. More implementation details, including island\-based parallelism, are in Appendix[G](https://arxiv.org/html/2605.15308#A7)\.

Algorithm 1SMCEvolve0:

NNparticles,

KKsequential LLM proposals per particle, ESS threshold

κ∈\(0,1\)\\kappa\\in\(0,1\), prior

p0\(⋅∣q\)p\_\{0\}\(\\cdot\\mid q\), LLM proposal

Qt\(⋅∣x,𝒞t\)Q\_\{t\}\(\\cdot\\mid x,\\mathcal\{C\}\_\{t\}\), reward

R\(⋅\)R\(\\cdot\), target inverse temperature

β\\beta
0:Particles approximating

p∗\(⋅∣q\)p^\{\*\}\(\\cdot\\mid q\)
1:for

n=1,…,Nn=1,\\ldots,Ndo

2:

x0\(n\)∼p0\(⋅∣q\)x\_\{0\}^\{\(n\)\}\\sim p\_\{0\}\(\\cdot\\mid q\);

R\(n\)←R\(x0\(n\)\)R^\{\(n\)\}\\leftarrow R\(x\_\{0\}^\{\(n\)\}\)\# Initialize from LLM prior

3:endfor

4:

λ0←0\\lambda\_\{0\}\\leftarrow 0,

t←0t\\leftarrow 0
5:repeat

6:

t←t\+1t\\leftarrow t\+1
7:Determine

λt\\lambda\_\{t\}via ESS bisection \([11](https://arxiv.org/html/2605.15308#S2.E11)\);

Δβt←\(λt−λt−1\)β\\Delta\\beta\_\{t\}\\leftarrow\(\\lambda\_\{t\}\-\\lambda\_\{t\-1\}\)\\beta\# Adaptive temperature

8:Compute weights

Wt\(n\)W\_\{t\}^\{\(n\)\}via \([9](https://arxiv.org/html/2605.15308#S2.E9)\); resample ancestor indices

\{At\(n\)\}n=1N\\\{A\_\{t\}^\{\(n\)\}\\\}\_\{n=1\}^\{N\}\# Parent resampling

9:for

n=1,…,Nn=1,\\ldots,Ndo

10:

z←xt−1\(At\(n\)\)z\\leftarrow x\_\{t\-1\}^\{\(A\_\{t\}^\{\(n\)\}\)\}\# Resampled parent \(current state of MH chain\)

11:for

k=1,…,Kk=1,\\ldots,Kdo

12:

x′∼Qt\(⋅∣z,𝒞t\)x^\{\\prime\}\\sim Q\_\{t\}\(\\cdot\\mid z,\\mathcal\{C\}\_\{t\}\)\# LLM proposes from current state

13:With probability

αt\(z,x′\)\\alpha\_\{t\}\(z,x^\{\\prime\}\)via \([10](https://arxiv.org/html/2605.15308#S2.E10)\):

z←x′z\\leftarrow x^\{\\prime\}\# MH accept/reject

14:endfor

15:

xt\(n\)←zx\_\{t\}^\{\(n\)\}\\leftarrow z;

R\(n\)←R\(xt\(n\)\)R^\{\(n\)\}\\leftarrow R\(x\_\{t\}^\{\(n\)\}\)
16:endfor

17:until

λt=1\\lambda\_\{t\}=1\# Automatic termination

#### A unified exploration–exploitation transition\.

The three core designs ofSMCEvolveare not independent: they are coupled through the single temperature parameterβt\\beta\_\{t\}, which governs the exploration–exploitation trade\-off across all of them\. At smallβt\\beta\_\{t\}, adaptive parent resampling \([9](https://arxiv.org/html/2605.15308#S2.E9)\) draws nearly uniformly across particles, MH acceptance \([10](https://arxiv.org/html/2605.15308#S2.E10)\) admits almost any LLM proposal, and ESS\-driven scheduling \([11](https://arxiv.org/html/2605.15308#S2.E11)\) takes small steps inβt\\beta\_\{t\}—together driving exploration\. At largeβt\\beta\_\{t\}, parent resampling concentrates on high\-reward particles, MH acceptance admits only reward\-improving proposals, and ESS\-driven scheduling takes larger steps—together driving exploitation\. Through this single parameter, the three designs are no longer independent choices but coordinated mechanisms of one exploration–exploitation schedule that emerges from the SMC framework itself\.

#### Existing frameworks as special cases\.

Existing agents such as AlphaEvolve\[[24](https://arxiv.org/html/2605.15308#bib.bib9)\]and ShinkaEvolve\[[17](https://arxiv.org/html/2605.15308#bib.bib8)\]arise as special cases of our framework when each component reduces to a specific choice: adaptive parent resampling reduces to a fixed inverse temperatureβt≡β\\beta\_\{t\}\\equiv\\beta, mixture mutation with acceptance reduces to a single LLM proposal \(K=1K=1,αt≡1\\alpha\_\{t\}\\equiv 1\) with unconditional acceptance, and automatic convergence control reduces to a pre\-fixed iteration countTT\. This explains why existing frameworks work, whileSMCEvolveprovides a more general and unified principle for designing evolving agents\.

## 3Convergence Analysis

Direct sampling from the reward\-tilted targetp∗p^\{\*\}is intractable\. The SMC formulation ofSMCEvolveinstead constructs an interacting particle approximation through annealed reweighting, resampling, and mutation\. We measure this approximation in the standard SMC sense: for bounded test functionsff, the terminal empirical measureηTN=N−1∑nδxT\(n\)\\eta\_\{T\}^\{N\}=N^\{\-1\}\\sum\_\{n\}\\delta\_\{x\_\{T\}^\{\(n\)\}\}should satisfyηTN\(f\)≈p∗\(f\)\\eta\_\{T\}^\{N\}\(f\)\\approx p^\{\*\}\(f\)\. The result below gives a finite\-sample LLM\-call budget for achieving this approximation under explicit bridge regularity and mutation\-kernel assumptions\.

Letℬ:=NTK\\mathcal\{B\}:=NTKdenote the total number of LLM calls, accounting forNNparticles,TTannealing stages, andKKapplications of the one\-step mutation kernel per particle\. The following theorem states the resulting finite\-sample complexity bound\.

###### Theorem 3\.1\(Finite\-sample complexity ofSMCEvolve\)\.

Suppose the reward has finite positive oscillationΔR\\Delta\_\{R\}on the support ofp0\(⋅∣q\)p\_\{0\}\(\\cdot\\mid q\)\. Let the schedule\{λt\}t=0T\\\{\\lambda\_\{t\}\\\}\_\{t=0\}^\{T\}be produced by the ESS bisection rule \([11](https://arxiv.org/html/2605.15308#S2.E11)\) with thresholdκ∈\(0,1\)\\kappa\\in\(0,1\), and assume the resulting adjacent bridges satisfysup1≤t≤T‖pt/pt−1‖L2\(pt−1\)2≤κ−1\\sup\_\{1\\leq t\\leq T\}\\\|p\_\{t\}/p\_\{t\-1\}\\\|\_\{L^\{2\}\(p\_\{t\-1\}\)\}^\{2\}\\leq\\kappa^\{\-1\}\. Assume also that for eachtt, the one\-step mutation kernelMtM\_\{t\}isptp\_\{t\}\-invariant and uniformly ergodic with rateρt\\rho\_\{t\}, and letρ=max1≤t≤T⁡ρt∈\(0,1\)\\rho=\\max\_\{1\\leq t\\leq T\}\\rho\_\{t\}\\in\(0,1\)\.

Then for any fixed bounded test functionf:𝒳→ℝf:\\mathcal\{X\}\\to\\mathbb\{R\}with‖f‖∞≤1\\\|f\\\|\_\{\\infty\}\\leq 1and anyϵ\>0\\epsilon\>0, there exists a choice of\(N,T,K\)\(N,T,K\)such that the empirical measureηTN=1N∑nδxT\(n\)\\eta\_\{T\}^\{N\}=\\frac\{1\}\{N\}\\sum\_\{n\}\\delta\_\{x\_\{T\}^\{\(n\)\}\}induced by the particles\{xT\(n\)\}\\\{x\_\{T\}^\{\(n\)\}\\\}satisfies111The constant3/43/4is the high\-probability guarantee in Theorem 1 of Marion et al\.\[[21](https://arxiv.org/html/2605.15308#bib.bib5)\]; it can be amplified to1−δ1\-\\deltaat alog⁡\(1/δ\)\\log\(1/\\delta\)multiplicative cost by running the samplerO\(log⁡\(1/δ\)\)O\(\\log\(1/\\delta\)\)times and taking the median of the estimates\.

ℙ\(\|ηTN\(f\)−p∗\(f\)\|≤ϵ\)≥34\.\\mathbb\{P\}\\\!\\left\(\\big\|\\eta\_\{T\}^\{N\}\(f\)\-p^\{\*\}\(f\)\\big\|\\leq\\epsilon\\right\)\\;\\geq\\;\\tfrac\{3\}\{4\}\.\(12\)
The corresponding LLM\-call budgetℬ=NTK\\mathcal\{B\}=NTKsatisfies

ℬ=𝒪~\(ϵ−2∨κ−11−ρ⋅βΔR\),\\mathcal\{B\}\\;=\\;\\widetilde\{\\mathcal\{O\}\}\\\!\\left\(\\frac\{\\epsilon^\{\-2\}\\vee\\kappa^\{\-1\}\}\{1\-\\rho\}\\cdot\\beta\\Delta\_\{R\}\\right\),\(13\)where𝒪~\\widetilde\{\\mathcal\{O\}\}suppresses polylogarithmic factors\.

The theorem provides the standard SMC guarantee for the terminal empirical measure\. Intuitively, for any fixed bounded statisticff, the final particle population yields a high\-accuracy estimate of its expectation under the ideal reward\-tilted target distributionp∗p^\{\*\}\. For example, choosing the normalized rewardf=\(R−R−\)/ΔRf=\(R\-R\_\{\-\}\)/\\Delta\_\{R\}withΔR\>0\\Delta\_\{R\}\>0, \([12](https://arxiv.org/html/2605.15308#S3.E12)\) ensures that the mean reward of the final population is close to the mean reward obtained by exact sampling fromp∗p^\{\*\}\. The best\-so\-far optimization performance is evaluated empirically in[Section˜4](https://arxiv.org/html/2605.15308#S4)\. More generally, since \([12](https://arxiv.org/html/2605.15308#S3.E12)\) holds for every bounded test functionff, the final particles\{xT\(n\)\}\\\{x\_\{T\}^\{\(n\)\}\\\}collectively provides a consistent approximation to samples from the target distributionp∗p^\{\*\}\.

The assumptions in[Theorem˜3\.1](https://arxiv.org/html/2605.15308#S3.Thmtheorem1)correspond to the two key ingredients underlying finite\-sample guarantee: neighboring bridge distributions remain sufficiently close, and the mutation kernel adequately explores the target distribution at each stage\. InSMCEvolve, the ESS\-based bisection rule is designed to adaptively construct short bridges, while a formal theoretical characterization is left for future work\. Similarly, the LLM\-based mutation kernel is designed to approximately preserveptp\_\{t\}\-invariance through an MH\-style accept/reject mechanism, while improving exploration through the use of four complementary proposal modes\. Under the full MH ratio, detailed balance and henceptp\_\{t\}\-invariance holds exactly\. In practice, because black\-box LLM APIs do not expose the quantities required for the full MH correction, we employ the reward\-only approximation described in Appendix[E](https://arxiv.org/html/2605.15308#A5)\. The acceptance\-rate and kernel\-selection diagnostics in[Figure˜3\(b\)](https://arxiv.org/html/2605.15308#S2.F3.sf2)suggest that this approximation remains effective in practice and provides a reasonable surrogate for the idealized invariant kernel\.

#### Proof sketch\.

[Theorem˜3\.1](https://arxiv.org/html/2605.15308#S3.Thmtheorem1)is a specialization of Theorem 1 and Sec\. 5\.1 of Marion et al\.\[[21](https://arxiv.org/html/2605.15308#bib.bib5)\]in its mixing\-time form\. The imported bound has the formℬ=𝒪~\(\(ϵ−2∨γ2\)⋅τ⋅log⁡Γ\)\\mathcal\{B\}=\\widetilde\{\\mathcal\{O\}\}\\\!\\big\(\(\\epsilon^\{\-2\}\\vee\\gamma^\{2\}\)\\cdot\\tau\\cdot\\log\\Gamma\\big\), whereγ2\\gamma^\{2\}measures bridge regularity,τ\\taumeasures mutation mixing time, andΓ=supx:p0\(x∣q\)\>0p∗\(x\)/p0\(x\)\\Gamma=\\sup\_\{x:p\_\{0\}\(x\\mid q\)\>0\}p^\{\*\}\(x\)/p\_\{0\}\(x\)measures the length of the annealing path\. We map these three quantities toSMCEvolveas follows:

- •*\(S1: bridge term\)γ2≤κ−1\\gamma^\{2\}\\leq\\kappa^\{\-1\}\.*The theorem assumes that adjacent bridges have controlledL2L^\{2\}ratio; the ESS bisection rule \([11](https://arxiv.org/html/2605.15308#S2.E11)\) is the finite\-sample mechanism that targets this condition\. For the reward\-tilted path, short temperature steps also satisfy the deterministic certificate‖pt/pt−1‖∞≤eΔβtΔR\\\|p\_\{t\}/p\_\{t\-1\}\\\|\_\{\\infty\}\\leq e^\{\\Delta\\beta\_\{t\}\\Delta\_\{R\}\}\.
- •*\(S2: mutation term\)τ≤1/\(1−ρ\)\\tau\\leq 1/\(1\-\\rho\)via uniform ergodicity\.*Uniform ergodicity says thatKKsteps of the one\-step kernel move any starting state toward the current bridgeptp\_\{t\}at rateρK\\rho^\{K\}\. Inverting this decay gives a mixing cost proportional to\(1−ρ\)−1\(1\-\\rho\)^\{\-1\}up to logarithmic factors\.
- •*\(S3: path\-length term\)log⁡Γ≤βΔR\\log\\Gamma\\leq\\beta\\Delta\_\{R\}\.*Sincep∗\(x\)/p0\(x∣q\)=eβR\(x\)/Z\(q\)p^\{\*\}\(x\)/p\_\{0\}\(x\\mid q\)=e^\{\\beta R\(x\)\}/Z\(q\)on the support ofp0p\_\{0\}, bounded reward oscillation givesΓ≤eβR\+/eβR−=eβΔR\\Gamma\\leq e^\{\\beta R\_\{\+\}\}/e^\{\\beta R\_\{\-\}\}=e^\{\\beta\\Delta\_\{R\}\}and hencelog⁡Γ≤βΔR\\log\\Gamma\\leq\\beta\\Delta\_\{R\}\.

Full derivations are in Appendix[F](https://arxiv.org/html/2605.15308#A6)\.

#### Regime of validity\.

[Theorem˜3\.1](https://arxiv.org/html/2605.15308#S3.Thmtheorem1)bounds the product budgetℬ=NTK\\mathcal\{B\}=NTK\. This product bound should be read together with the usual SMC requirement that each factor is large enough: enough particles for stable resampling, enough mutation steps for mixing, and enough bridge stages for controlled reweighting\. Appendix[F\.1](https://arxiv.org/html/2605.15308#A6.SS1)records these per\-factor requirements and the degenerate cases withN=1N=1andK=1K=1\.

## 4Experiments

Table 1:Math domain\[[24](https://arxiv.org/html/2605.15308#bib.bib9)\]: best reward per task, LLM calls in theCallscolumn\.Table 2:Algorithm efficiency domain\[[25](https://arxiv.org/html/2605.15308#bib.bib26)\]: per\-task speedup, LLM calls in theCallscolumn\.Table 3:Symbolic regression domain\[[30](https://arxiv.org/html/2605.15308#bib.bib16)\]: best reward per task, LLM calls in theCallscolumn\.### 4\.1Main Results

We evaluateSMCEvolveon four domains:*Math*— combinatorial geometry problems from the AlphaEvolve benchmark\[[24](https://arxiv.org/html/2605.15308#bib.bib9)\]\(Table[1](https://arxiv.org/html/2605.15308#S4.T1)\);*AlgoTune*— runtime\-optimization tasks\[[25](https://arxiv.org/html/2605.15308#bib.bib26)\]spanning numerical computation, signal processing, and linear algebra \(Table[2](https://arxiv.org/html/2605.15308#S4.T2)\);*Symbolic Regression*— problems from LLM\-SRBench\[[30](https://arxiv.org/html/2605.15308#bib.bib16)\]across physical oscillators, bacterial population growth, chemical reactions, and materials science \(Table[3](https://arxiv.org/html/2605.15308#S4.T3)\); and*AutoResearch*— end\-to\-end GPT pretraining onTinyStories\[[14](https://arxiv.org/html/2605.15308#bib.bib4)\]\(Figure[4](https://arxiv.org/html/2605.15308#S4.F4)\), where reward ismax⁡\(0,2\.0−val\_bpb\)\\max\(0,\\,2\.0\-\\text\{val\\\_bpb\}\)on a held\-out set; each candidate trains for 60 s on a single A5000\. We compare againstReEvo\[[33](https://arxiv.org/html/2605.15308#bib.bib6)\],OpenEvolve, andShinkaEvolve\[[17](https://arxiv.org/html/2605.15308#bib.bib8)\]\. All methods share an identicalgpt\-5\-mini\+gemini\-3\-flashensemble and each cell reports the best score over 3 seeds\. Baselines run to a fixed LLM\-call budget \(200 for Math and AutoResearch, 400 for Symbolic Regression, 500 for AlgoTune\), whileSMCEvolveterminates automatically via ESS\-driven stopping\.SMCEvolveattains the best score on the majority of tasks across all four domains and the highest final reward on AutoResearch\. Notably, these gains come with strictly fewer LLM calls—the mean call count under ESS\-driven termination is below the fixed baseline budget on every domain \(reported in theCallscolumn\)\.

![Refer to caption](https://arxiv.org/html/2605.15308v1/figs/method_comparison.png)Figure 4:AutoResearch domain\[[14](https://arxiv.org/html/2605.15308#bib.bib4)\]\. The SMCEvolve terminated automatically under ESS\-control\.
### 4\.2Ablation Study

We ablate the key designs—adaptive parent resampling and the mixture of mutation kernels in[Section˜2](https://arxiv.org/html/2605.15308#S2)—as well as the hyperparametersNN\(particle count\) andKK\(MH chain length\)\. Each variant changes a single choice while holding the LLM\-call budget fixed \(Table[4](https://arxiv.org/html/2605.15308#S4.T4)\)\.

Table 4:Ablations on Circle PackingN=21N\{=\}21\. Best reward over 3 seeds; baseline inbold\.#### Parent resampling\.

The adaptive resampling interpolates between uniform and greedy selection, transitioning from exploration to exploitation\. To verify that this adaptivity is necessary, we replace it with two fixed alternatives at the extremes:uniformignores the reward signal andgreedy\(argmax\) always select the best\. Both degrades, confirming that our adaptive choice is essential\.

#### Mutation kernel choice\.

The mixture targets ergodicity, and Thompson sampling adaptively selects among kernels\. We ablate each: replacing the mixture with single kernels, and Thompson with a uniform mixture\. Every single kernel or uniform mixture degrades the baseline \(≤0\.9929\\leq 0\.9929\), showing that both kernel diversity and adaptive selection are necessary\.

#### Breadth vs\. depth\.

Under a fixed budget ofN×K=16N\\times K=16LLM calls per iteration, the budget can favor more particles or longer MH chains\. Both extremes collapse to0\.93790\.9379:N=4,K=4N\{=\}4,K\{=\}4starves resampling of diversity;N=16,K=1N\{=\}16,K\{=\}1leaves chains too short to mix from the parent\. The balancedN=8,K=2N\{=\}8,K\{=\}2split lets resampling explore broadly while each particle refines locally\.

## 5Conclusion

In this paper, we proposeSMCEvolve, recasting LLM\-driven program evolution as Sequential Monte Carlo sampling from a reward\-tilted target distribution\. Three coupled mechanisms emerge from this single principle, yielding the first finite\-sample bound for evolutionary program search and state\-of\-the\-art performance across domains\. Future directions include tightening the bound and extending to multi\-objective or reward\-free settings\.

## Acknowledgements

We acknowledge support from NSF grants IIS\-2312840 and ECCS\-2409701\.

## References

- \[1\]N\. Abhyankar, S\. Kabra, S\. Desai, and C\. K\. ReddyLLEMA: evolutionary search with llms for multi\-objective materials discovery\.Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px1.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.15308#S2.SS1.p1.3)\.
- \[2\]J\. Carpenter, P\. Clifford, and P\. Fearnhead\(1999\)Improved particle filter for nonlinear problems\.InIEE Proceedings – Radar, Sonar and Navigation,Vol\.146,pp\. 2–7\.External Links:[Document](https://dx.doi.org/10.1049/ip-rsn%3A19990255)Cited by:[§G\.2](https://arxiv.org/html/2605.15308#A7.SS2.p1.1)\.
- \[3\]M\. Cemri, S\. Agrawal, A\. Gupta, S\. Liu, A\. Cheng, Q\. Mang, A\. Naren, L\. E\. Erdogan, K\. Sen, M\. Zaharia,et al\.\(2026\)Adaevolve: adaptive llm driven zeroth\-order optimization\.arXiv preprint arXiv:2602\.20133\.Cited by:[§1](https://arxiv.org/html/2605.15308#S1.p2.1)\.
- \[4\]J\. S\. Chan, N\. Chowdhury, O\. Jaffe, J\. Aung, D\. Sherburn, E\. Mays, G\. Starace, K\. Liu, L\. Maksin, T\. Patwardhan,et al\.\(2024\)Mle\-bench: evaluating machine learning agents on machine learning engineering\.arXiv preprint arXiv:2410\.07095\.Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px1.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p1.1)\.
- \[5\]N\. Chopin and O\. Papaspiliopoulos\(2020\)An introduction to sequential Monte Carlo\.Springer Series in Statistics,Springer,Cham\.External Links:ISBN 978\-3\-030\-47844\-5,[Document](https://dx.doi.org/10.1007/978-3-030-47845-2)Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px3.p1.1)\.
- \[6\]C\. Dai, J\. Heng, P\. E\. Jacob, and N\. Whiteley\(2022\)An invitation to sequential monte carlo samplers\.Journal of the American Statistical Association117\(539\),pp\. 1587–1600\.Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px3.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p4.8),[§2\.2](https://arxiv.org/html/2605.15308#S2.SS2.p1.12),[§2\.3](https://arxiv.org/html/2605.15308#S2.SS3.SSS0.Px1.p2.3)\.
- \[7\]P\. Del Moral, A\. Doucet, and A\. Jasra\(2006\)Sequential monte carlo samplers\.Journal of the Royal Statistical Society Series B: Statistical Methodology68\(3\),pp\. 411–436\.Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px3.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p4.8),[§2\.2](https://arxiv.org/html/2605.15308#S2.SS2.p1.12),[§2\.3](https://arxiv.org/html/2605.15308#S2.SS3.SSS0.Px2.p3.2),[§2\.3](https://arxiv.org/html/2605.15308#S2.SS3.SSS0.Px2.p7.2)\.
- \[8\]P\. Del Moral\(2004\)Feynman\-Kac formulae: genealogical and interacting particle systems with applications\.Probability and Its Applications,Springer,New York\.External Links:ISBN 978\-0\-387\-20268\-6,[Document](https://dx.doi.org/10.1007/978-1-4684-9393-1)Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px3.p1.1)\.
- \[9\]R\. Douc, O\. Cappé, and E\. Moulines\(2005\)Comparison of resampling schemes for particle filtering\.Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis \(ISPA 2005\),pp\. 64–69\.External Links:[Document](https://dx.doi.org/10.1109/ISPA.2005.195385)Cited by:[§G\.2](https://arxiv.org/html/2605.15308#A7.SS2.p1.1)\.
- \[10\]S\. Feng, X\. Kong, S\. Ma, A\. Zhang, D\. Yin, C\. Wang, R\. Pang, and Y\. Yang\(2025\)Step\-by\-step reasoning for math problems via twisted sequential Monte Carlo\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=Ze4aPP0tIn)Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px2.p1.1)\.
- \[11\]B\. Georgiev, J\. Gómez\-Serrano, T\. Tao, and A\. Z\. Wagner\(2025\)Mathematical exploration and discovery at scale\.arXiv preprint arXiv:2511\.02864\.Cited by:[§1](https://arxiv.org/html/2605.15308#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.15308#S2.SS1.p1.3)\.
- \[12\]Y\. Imajuku, K\. Horie, Y\. Iwata, K\. Aoki, N\. Takahashi, and T\. Akiba\(2025\)Ale\-bench: a benchmark for long\-horizon objective\-driven algorithm engineering\.arXiv preprint arXiv:2506\.09050\.Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px1.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p1.1)\.
- \[13\]J\. Jiang, T\. Ding, and Z\. Zhu\(2026\)DeltaEvolve: accelerating scientific discovery through momentum\-driven evolution\.arXiv preprint arXiv:2602\.02919\.Cited by:[§1](https://arxiv.org/html/2605.15308#S1.p2.1)\.
- \[14\]A\. Karpathy\(2026\)Autoresearch: ai agents running research experiments\.Note:[https://github\.com/karpathy/autoresearch](https://github.com/karpathy/autoresearch)GitHub repositoryCited by:[§H\.4](https://arxiv.org/html/2605.15308#A8.SS4.p1.5),[3rd item](https://arxiv.org/html/2605.15308#S1.I2.i3.p1.1),[Figure 4](https://arxiv.org/html/2605.15308#S4.F4),[Figure 4](https://arxiv.org/html/2605.15308#S4.F4.3.2),[§4\.1](https://arxiv.org/html/2605.15308#S4.SS1.p1.1)\.
- \[15\]S\. Kirkpatrick, C\. D\. Gelatt, and M\. P\. Vecchi\(1983\)Optimization by simulated annealing\.Science220\(4598\),pp\. 671–680\.External Links:[Document](https://dx.doi.org/10.1126/science.220.4598.671)Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px3.p1.1)\.
- \[16\]G\. Kitagawa\(1996\)Monte carlo filter and smoother for non\-gaussian nonlinear state space models\.Journal of Computational and Graphical Statistics5\(1\),pp\. 1–25\.Cited by:[§G\.2](https://arxiv.org/html/2605.15308#A7.SS2.p1.1)\.
- \[17\]R\. T\. Lange, Y\. Imajuku, and E\. Cetin\(2025\)Shinkaevolve: towards open\-ended and sample\-efficient program evolution\.arXiv preprint arXiv:2509\.19349\.Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px1.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p2.1),[§2\.3](https://arxiv.org/html/2605.15308#S2.SS3.SSS0.Px5.p1.4),[§4\.1](https://arxiv.org/html/2605.15308#S4.SS1.p1.1)\.
- \[18\]A\. K\. Lew, Z\. Tan, G\. Grand, and V\. K\. Mansinghka\(2023\)Sequential monte carlo steering of large language models using probabilistic programs\.arXiv preprint arXiv:2306\.03081\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2306.03081),[Link](https://arxiv.org/abs/2306.03081)Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px2.p1.1)\.
- \[19\]F\. Liu, X\. Tong, M\. Yuan, X\. Lin, F\. Luo, Z\. Wang, Z\. Lu, and Q\. Zhang\(2024\)Evolution of heuristics: towards efficient automatic algorithm design using large language model\.arXiv preprint arXiv:2401\.02051\.Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px1.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p2.1)\.
- \[20\]S\. Liu, S\. Agarwal, M\. Maheswaran, M\. Cemri, Z\. Li, Q\. Mang, A\. Naren, E\. Boneh, A\. Cheng, M\. Z\. Pan,et al\.\(2026\)Evox: meta\-evolution for automated discovery\.arXiv preprint arXiv:2602\.23413\.Cited by:[§1](https://arxiv.org/html/2605.15308#S1.p2.1)\.
- \[21\]J\. Marion, J\. Mathews, and S\. C\. Schmidler\(2023\)Finite\-sample complexity of sequential monte carlo estimators\.The Annals of Statistics51\(3\),pp\. 1357–1375\.Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px3.p1.1),[item 1](https://arxiv.org/html/2605.15308#A6.I1.i1.p1.4),[item 2](https://arxiv.org/html/2605.15308#A6.I1.i2.p1.5),[Appendix F](https://arxiv.org/html/2605.15308#A6.SS0.SSS0.Px1.p1.7),[Appendix F](https://arxiv.org/html/2605.15308#A6.SS0.SSS0.Px2),[Appendix F](https://arxiv.org/html/2605.15308#A6.SS0.SSS0.Px2.p1.16),[Appendix F](https://arxiv.org/html/2605.15308#A6.SSx2.SSS0.Px4.p1.1),[Appendix F](https://arxiv.org/html/2605.15308#A6.p1.1),[§3](https://arxiv.org/html/2605.15308#S3.SS0.SSS0.Px1.p1.4),[footnote 1](https://arxiv.org/html/2605.15308#footnote1)\.
- \[22\]S\. P\. Meyn and R\. L\. Tweedie\(2012\)Markov chains and stochastic stability\.Springer Science & Business Media\.Cited by:[Definition 2\.3](https://arxiv.org/html/2605.15308#S2.SS2.SSS0.Px1.3)\.
- \[23\]R\. M\. Neal\(2001\)Annealed importance sampling\.Statistics and Computing11\(2\),pp\. 125–139\.External Links:[Document](https://dx.doi.org/10.1023/A%3A1008923215028)Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px3.p1.1)\.
- \[24\]A\. Novikov, N\. Vũ, M\. Eisenberger, E\. Dupont, P\. Huang, A\. Z\. Wagner, S\. Shirobokov, B\. Kozlovskii, F\. J\. Ruiz, A\. Mehrabian,et al\.\(2025\)Alphaevolve: a coding agent for scientific and algorithmic discovery\.arXiv preprint arXiv:2506\.13131\.Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px1.p1.1),[3rd item](https://arxiv.org/html/2605.15308#S1.I2.i3.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p2.1),[§2\.3](https://arxiv.org/html/2605.15308#S2.SS3.SSS0.Px5.p1.4),[§4\.1](https://arxiv.org/html/2605.15308#S4.SS1.p1.1),[Table 1](https://arxiv.org/html/2605.15308#S4.T1),[Table 1](https://arxiv.org/html/2605.15308#S4.T1.17.2)\.
- \[25\]O\. Press, B\. Amos, H\. Zhao, Y\. Wu, S\. K\. Ainsworth, D\. Krupke, P\. Kidger, T\. Sajed, B\. Stellato, J\. Park, N\. Bosch, E\. Meril, A\. Steppi, A\. Zharmagambetov, F\. Zhang, D\. Perez\-Pineiro, A\. Mercurio, N\. Zhan, T\. Abramovich, K\. Lieret, H\. Zhang, S\. Huang, M\. Bethge, and O\. Press\(2025\)AlgoTune: can language models speed up general\-purpose numerical programs?\.arXiv preprint arXiv:2507\.15887\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2507.15887),[Link](https://arxiv.org/abs/2507.15887)Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px1.p1.1),[§H\.3](https://arxiv.org/html/2605.15308#A8.SS3.p1.5),[3rd item](https://arxiv.org/html/2605.15308#S1.I2.i3.p1.1),[§4\.1](https://arxiv.org/html/2605.15308#S4.SS1.p1.1),[Table 2](https://arxiv.org/html/2605.15308#S4.T2),[Table 2](https://arxiv.org/html/2605.15308#S4.T2.13.2)\.
- \[26\]B\. Romera\-Paredes, M\. Barekatain, A\. Novikov, M\. Balog, M\. P\. Kumar, E\. Dupont, F\. J\. Ruiz, J\. S\. Ellenberg, P\. Wang, O\. Fawzi,et al\.\(2024\)Mathematical discoveries from program search with large language models\.Nature625\(7995\),pp\. 468–475\.Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px1.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p1.1)\.
- \[27\]R\. Y\. Rubinstein\(1999\)The cross\-entropy method for combinatorial and continuous optimization\.Methodology and Computing in Applied Probability1\(2\),pp\. 127–190\.External Links:[Document](https://dx.doi.org/10.1023/A%3A1010091220143)Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px3.p1.1)\.
- \[28\]D\. J\. Russo, B\. Van Roy, A\. Kazerouni, I\. Osband, and Z\. Wen\(2018\)A tutorial on thompson sampling\.Foundations and Trends in Machine Learning11\(1\),pp\. 1–96\.External Links:[Document](https://dx.doi.org/10.1561/2200000070)Cited by:[2nd item](https://arxiv.org/html/2605.15308#S1.I1.i2.p1.1),[§2\.3](https://arxiv.org/html/2605.15308#S2.SS3.SSS0.Px2.p5.1)\.
- \[29\]P\. Shojaee, K\. Meidani, S\. Gupta, A\. B\. Farimani, and C\. K\. Reddy\(2024\)Llm\-sr: scientific equation discovery via programming with large language models\.arXiv preprint arXiv:2404\.18400\.Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px1.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.15308#S2.SS1.p1.3)\.
- \[30\]P\. Shojaee, N\. Nguyen, K\. Meidani, A\. B\. Farimani, K\. D\. Doan, and C\. K\. Reddy\(2025\)Llm\-srbench: a new benchmark for scientific equation discovery with large language models\.arXiv preprint arXiv:2504\.10415\.Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px1.p1.1),[§H\.2](https://arxiv.org/html/2605.15308#A8.SS2.p1.3),[3rd item](https://arxiv.org/html/2605.15308#S1.I2.i3.p1.1),[§4\.1](https://arxiv.org/html/2605.15308#S4.SS1.p1.1),[Table 3](https://arxiv.org/html/2605.15308#S4.T3),[Table 3](https://arxiv.org/html/2605.15308#S4.T3.5.2)\.
- \[31\]Y\. Yamada, R\. T\. Lange, C\. Lu, S\. Hu, C\. Lu, J\. Foerster, J\. Clune, and D\. Ha\(2025\)The ai scientist\-v2: workshop\-level automated scientific discovery via agentic tree search\.arXiv preprint arXiv:2504\.08066\.Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px1.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p1.1)\.
- \[32\]M\. Yan, B\. Peng, B\. Coleman, Z\. Chen, Z\. Xie, S\. Chen, Z\. He, N\. Sachdeva, I\. Ye, W\. Wang,et al\.\(2026\)Pacevolve: enabling long\-horizon progress\-aware consistent evolution\.arXiv preprint arXiv:2601\.10657\.Cited by:[§1](https://arxiv.org/html/2605.15308#S1.p2.1)\.
- \[33\]H\. Ye, J\. Wang, Z\. Cao, F\. Berto, C\. Hua, H\. Kim, J\. Park, and G\. Song\(2024\)Reevo: large language models as hyper\-heuristics with reflective evolution\.Advances in neural information processing systems37,pp\. 43571–43608\.Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px1.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p1.1),[§1](https://arxiv.org/html/2605.15308#S1.p2.1),[§4\.1](https://arxiv.org/html/2605.15308#S4.SS1.p1.1)\.
- \[34\]S\. Zhao, R\. Brekelmans, A\. Makhzani, and R\. B\. Grosse\(2024\)Probabilistic inference in language models via twisted sequential Monte Carlo\.InProceedings of the 41st International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.235,pp\. 60704–60748\.External Links:[Link](https://proceedings.mlr.press/v235/zhao24c.html)Cited by:[Appendix A](https://arxiv.org/html/2605.15308#A1.SS0.SSS0.Px2.p1.1)\.

This appendix provides supplementary material forSMCEvolve\. We first position the method relative to prior work in Appendix[A](https://arxiv.org/html/2605.15308#A1), then give the derivations and proofs supporting the SMC formulation \(Appendices[B](https://arxiv.org/html/2605.15308#A2),[C](https://arxiv.org/html/2605.15308#A3),[D](https://arxiv.org/html/2605.15308#A4),[E](https://arxiv.org/html/2605.15308#A5), and[F](https://arxiv.org/html/2605.15308#A6)\), followed by implementation details \(Appendix[G](https://arxiv.org/html/2605.15308#A7)\), benchmark/task specifications \(Appendix[H](https://arxiv.org/html/2605.15308#A8)\), hyperparameters \(Appendix[J](https://arxiv.org/html/2605.15308#A10)\), prompts \(Appendix[L](https://arxiv.org/html/2605.15308#A12)\), and a visualization of a representative run \(Appendix[M](https://arxiv.org/html/2605.15308#A13)\)\.

## Appendix ARelated Work

#### LLM\-driven program evolution and scientific discovery\.

LLMs have become effective generators inside evaluator\-guided discovery loops\. FunSearch repeatedly proposes and evaluates programs to obtain mathematical constructions\[[26](https://arxiv.org/html/2605.15308#bib.bib10)\]; AlphaEvolve scales this idea to a general coding agent for scientific and algorithmic discovery\[[24](https://arxiv.org/html/2605.15308#bib.bib9)\]; and ShinkaEvolve, ReEvo, and Evolution of Heuristics improve sample efficiency through open\-ended evolution, reflection, and heuristic design\[[17](https://arxiv.org/html/2605.15308#bib.bib8),[33](https://arxiv.org/html/2605.15308#bib.bib6),[19](https://arxiv.org/html/2605.15308#bib.bib7)\]\. Related agent benchmarks and systems study automated scientific workflows, machine\-learning engineering, symbolic equation discovery, materials discovery, and numerical\-program optimization\[[31](https://arxiv.org/html/2605.15308#bib.bib13),[4](https://arxiv.org/html/2605.15308#bib.bib15),[12](https://arxiv.org/html/2605.15308#bib.bib14),[29](https://arxiv.org/html/2605.15308#bib.bib11),[30](https://arxiv.org/html/2605.15308#bib.bib16),[1](https://arxiv.org/html/2605.15308#bib.bib12),[25](https://arxiv.org/html/2605.15308#bib.bib26)\]\. These works motivate evaluator\-guided program search, but their parent selection, population management, mutation schedules, and stopping rules are typically engineered as separate heuristics\.SMCEvolvefocuses on this population\-search core and interprets it as SMC sampling from a reward\-tilted program distribution\.

#### SMC for language\-model inference and reasoning\.

Recent work also applies SMC directly to language\-model generation\. Lew et al\. use SMC steering for constrained generation specified as posterior inference in probabilistic sequence models\[[18](https://arxiv.org/html/2605.15308#bib.bib27)\]\. Zhao et al\. develop twisted SMC for sampling from sequence\-level unnormalized targets in LLM inference and safety tasks\[[34](https://arxiv.org/html/2605.15308#bib.bib28)\], and Feng et al\. use twisted SMC to guide multi\-step mathematical reasoning\[[10](https://arxiv.org/html/2605.15308#bib.bib29)\]\. These methods treat prefixes, complete sequences, or reasoning traces as particles within a single generation episode\. In contrast,SMCEvolvetreats executable programs as particles, obtains rewards from an external evaluator, and uses SMC to structure population\-based scientific program evolution rather than controlled decoding\.

#### SMC, annealing, and adaptive sampling foundations\.

Our method builds on SMC samplers, which move particles through intermediate distributions via weighting, resampling, and mutation\[[7](https://arxiv.org/html/2605.15308#bib.bib18),[6](https://arxiv.org/html/2605.15308#bib.bib19),[8](https://arxiv.org/html/2605.15308#bib.bib22),[5](https://arxiv.org/html/2605.15308#bib.bib23)\]\. The reward\-tempered path is related to annealed importance sampling, simulated annealing, and cross\-entropy methods for stochastic optimization\[[23](https://arxiv.org/html/2605.15308#bib.bib30),[15](https://arxiv.org/html/2605.15308#bib.bib31),[27](https://arxiv.org/html/2605.15308#bib.bib32)\]\. We import finite\-sample SMC complexity theory\[[21](https://arxiv.org/html/2605.15308#bib.bib5)\]rather than proving a new general theorem\. The contribution is the specialization to LLM\-driven program evolution: the reward\-tilted target, bridge path, resampling weights, and reference\-kernel requirements jointly explain the design levers and limits of the evolutionary coding agent\.

## Appendix BDerivation of the Reward\-Tilted Target Distribution

We provide the full derivation of the reward\-tilted target distribution in[Section˜2\.1](https://arxiv.org/html/2605.15308#S2.SS1)\. Starting from the regularized variational objective in \([1](https://arxiv.org/html/2605.15308#S2.E1)\), the Lagrangian with the normalization constraint∫p\(x\)𝑑x=1\\int p\(x\)\\,dx=1is

ℒ\[p\]=∫p\(x\)R\(x\)𝑑x−1β∫p\(x\)log⁡p\(x\)p0\(x∣q\)dx−μ\(∫p\(x\)𝑑x−1\)\.\\mathcal\{L\}\[p\]=\\int p\(x\)\\,R\(x\)\\,dx\-\\frac\{1\}\{\\beta\}\\int p\(x\)\\log\\frac\{p\(x\)\}\{p\_\{0\}\(x\\mid q\)\}\\,dx\-\\mu\\\!\\left\(\\int p\(x\)\\,dx\-1\\right\)\.\(14\)Setting the functional derivativeδℒ/δp\(x\)=0\\delta\\mathcal\{L\}/\\delta p\(x\)=0gives

R\(x\)−1β\(log⁡p\(x\)p0\(x∣q\)\+1\)−μ=0\.R\(x\)\-\\frac\{1\}\{\\beta\}\\left\(\\log\\frac\{p\(x\)\}\{p\_\{0\}\(x\\mid q\)\}\+1\\right\)\-\\mu=0\.\(15\)Solving forlog⁡p\(x\)\\log p\(x\)yields

log⁡p\(x\)=βR\(x\)\+log⁡p0\(x∣q\)−\(1\+μβ\)\.\\log p\(x\)=\\beta R\(x\)\+\\log p\_\{0\}\(x\\mid q\)\-\(1\+\\mu\\beta\)\.\(16\)Exponentiating and absorbing thexx\-independent terme−\(1\+μβ\)e^\{\-\(1\+\\mu\\beta\)\}into the partition functionZ\(q\)Z\(q\)gives the target distributionp∗\(x∣q\)p^\{\*\}\(x\\mid q\)in \([2](https://arxiv.org/html/2605.15308#S2.E2)\)\.

## Appendix CMH construction yields aptp\_\{t\}\-invariant kernel

Condition on the stagettand context𝒞t\\mathcal\{C\}\_\{t\}, and writeQt\(⋅∣x,𝒞t\)Q\_\{t\}\(\\cdot\\mid x,\\mathcal\{C\}\_\{t\}\)for the proposal kernel\.

###### Theorem C\.1\(MH construction yields aptp\_\{t\}\-invariant kernel\)\.

The MH kernelMtM\_\{t\}defined by \([23](https://arxiv.org/html/2605.15308#A5.E23)\) isptp\_\{t\}\-invariant, i\.e\., satisfies \([4](https://arxiv.org/html/2605.15308#S2.E4)\)\.

###### Proof\.

We show thatMtM\_\{t\}satisfies the stronger property of*detailed balance*,

pt\(x\)Mt\(x,y\)=pt\(y\)Mt\(y,x\),∀x,y∈𝒳,p\_\{t\}\(x\)\\,M\_\{t\}\(x,y\)\\;=\\;p\_\{t\}\(y\)\\,M\_\{t\}\(y,x\),\\qquad\\forall\\,x,y\\in\\mathcal\{X\},\(17\)which immediately impliesptp\_\{t\}\-invariance: integrating both sides of \([17](https://arxiv.org/html/2605.15308#A3.E17)\) overxxgives

∫pt\(x\)Mt\(x,y\)dx=pt\(y\)∫Mt\(y,x\)dx=pt\(y\),\\int p\_\{t\}\(x\)\\,M\_\{t\}\(x,y\)\\,\\mathrm\{d\}x\\;=\\;p\_\{t\}\(y\)\\int M\_\{t\}\(y,x\)\\,\\mathrm\{d\}x\\;=\\;p\_\{t\}\(y\),which is \([4](https://arxiv.org/html/2605.15308#S2.E4)\)\. Intuitively, \([17](https://arxiv.org/html/2605.15308#A3.E17)\) says that at equilibriumptp\_\{t\}, the probability flux fromxxtoyyequals the flux fromyytoxx\.

It remains to verify \([17](https://arxiv.org/html/2605.15308#A3.E17)\) for the MH kernel\. Forx=yx=y, both sides are equal, so the identity holds trivially\. Forx≠yx\\neq y, we need to show

pt\(x\)Qt\(y∣x,𝒞t\)αt\(x,y\)=pt\(y\)Qt\(x∣y,𝒞t\)αt\(y,x\)\.p\_\{t\}\(x\)\\,Q\_\{t\}\(y\\mid x,\\mathcal\{C\}\_\{t\}\)\\,\\alpha\_\{t\}\(x,y\)\\;=\\;p\_\{t\}\(y\)\\,Q\_\{t\}\(x\\mid y,\\mathcal\{C\}\_\{t\}\)\\,\\alpha\_\{t\}\(y,x\)\.Without loss of generality, assumept\(x\)Qt\(y∣x,𝒞t\)≤pt\(y\)Qt\(x∣y,𝒞t\)p\_\{t\}\(x\)\\,Q\_\{t\}\(y\\mid x,\\mathcal\{C\}\_\{t\}\)\\leq p\_\{t\}\(y\)\\,Q\_\{t\}\(x\\mid y,\\mathcal\{C\}\_\{t\}\)\(if the opposite inequality holds, swap the roles ofxxandyyin the argument below\)\. Under this assumption:

- •The ratio in \([23](https://arxiv.org/html/2605.15308#A5.E23)\) satisfiespt\(y\)Qt\(x∣y,𝒞t\)pt\(x\)Qt\(y∣x,𝒞t\)≥1\\dfrac\{p\_\{t\}\(y\)\\,Q\_\{t\}\(x\\mid y,\\mathcal\{C\}\_\{t\}\)\}\{p\_\{t\}\(x\)\\,Q\_\{t\}\(y\\mid x,\\mathcal\{C\}\_\{t\}\)\}\\geq 1, soαt\(x,y\)=1\\alpha\_\{t\}\(x,y\)=1\.
- •The reverse ratio ispt\(x\)Qt\(y∣x,𝒞t\)pt\(y\)Qt\(x∣y,𝒞t\)≤1\\dfrac\{p\_\{t\}\(x\)\\,Q\_\{t\}\(y\\mid x,\\mathcal\{C\}\_\{t\}\)\}\{p\_\{t\}\(y\)\\,Q\_\{t\}\(x\\mid y,\\mathcal\{C\}\_\{t\}\)\}\\leq 1, soαt\(y,x\)=pt\(x\)Qt\(y∣x,𝒞t\)pt\(y\)Qt\(x∣y,𝒞t\)\\alpha\_\{t\}\(y,x\)=\\dfrac\{p\_\{t\}\(x\)\\,Q\_\{t\}\(y\\mid x,\\mathcal\{C\}\_\{t\}\)\}\{p\_\{t\}\(y\)\\,Q\_\{t\}\(x\\mid y,\\mathcal\{C\}\_\{t\}\)\}\.

Now compute both sides:

Left side:pt\(x\)Qt\(y∣x,𝒞t\)αt\(x,y\)=pt\(x\)Qt\(y∣x,𝒞t\)⋅1=pt\(x\)Qt\(y∣x,𝒞t\)\.\\displaystyle p\_\{t\}\(x\)\\,Q\_\{t\}\(y\\mid x,\\mathcal\{C\}\_\{t\}\)\\,\\alpha\_\{t\}\(x,y\)\\;=\\;p\_\{t\}\(x\)\\,Q\_\{t\}\(y\\mid x,\\mathcal\{C\}\_\{t\}\)\\cdot 1\\;=\\;p\_\{t\}\(x\)\\,Q\_\{t\}\(y\\mid x,\\mathcal\{C\}\_\{t\}\)\.Right side:pt\(y\)Qt\(x∣y,𝒞t\)αt\(y,x\)=pt\(y\)Qt\(x∣y,𝒞t\)⋅pt\(x\)Qt\(y∣x,𝒞t\)pt\(y\)Qt\(x∣y,𝒞t\)=pt\(x\)Qt\(y∣x,𝒞t\)\.\\displaystyle p\_\{t\}\(y\)\\,Q\_\{t\}\(x\\mid y,\\mathcal\{C\}\_\{t\}\)\\,\\alpha\_\{t\}\(y,x\)\\;=\\;p\_\{t\}\(y\)\\,Q\_\{t\}\(x\\mid y,\\mathcal\{C\}\_\{t\}\)\\cdot\\frac\{p\_\{t\}\(x\)\\,Q\_\{t\}\(y\\mid x,\\mathcal\{C\}\_\{t\}\)\}\{p\_\{t\}\(y\)\\,Q\_\{t\}\(x\\mid y,\\mathcal\{C\}\_\{t\}\)\}\\;=\\;p\_\{t\}\(x\)\\,Q\_\{t\}\(y\\mid x,\\mathcal\{C\}\_\{t\}\)\.Both sides agree, so detailed balance holds, and henceptp\_\{t\}\-invariance\. ∎

## Appendix DWeight Simplification under Time\-Reversal Backward Kernel

Starting from the general SMC weight:

wt\(xt−1,xt\)=γt\(xt\)Lt−1\(xt,xt−1\)γt−1\(xt−1\)MtK\(xt−1,xt\)\.w\_\{t\}\(x\_\{t\-1\},x\_\{t\}\)=\\frac\{\\gamma\_\{t\}\(x\_\{t\}\)\\,L\_\{t\-1\}\(x\_\{t\},x\_\{t\-1\}\)\}\{\\gamma\_\{t\-1\}\(x\_\{t\-1\}\)\\,M\_\{t\}^\{K\}\(x\_\{t\-1\},x\_\{t\}\)\}\.\(18\)Substituting the time\-reversal backward kernelLt−1\(xt,xt−1\)=pt\(xt−1\)MtK\(xt−1,xt\)/pt\(xt\)L\_\{t\-1\}\(x\_\{t\},x\_\{t\-1\}\)=p\_\{t\}\(x\_\{t\-1\}\)\\,M\_\{t\}^\{K\}\(x\_\{t\-1\},x\_\{t\}\)/p\_\{t\}\(x\_\{t\}\):

wt\(xt−1,xt\)\\displaystyle w\_\{t\}\(x\_\{t\-1\},x\_\{t\}\)=γt\(xt\)γt−1\(xt−1\)⋅pt\(xt−1\)MtK\(xt−1,xt\)pt\(xt\)MtK\(xt−1,xt\)\\displaystyle=\\frac\{\\gamma\_\{t\}\(x\_\{t\}\)\}\{\\gamma\_\{t\-1\}\(x\_\{t\-1\}\)\}\\cdot\\frac\{p\_\{t\}\(x\_\{t\-1\}\)\\,M\_\{t\}^\{K\}\(x\_\{t\-1\},x\_\{t\}\)\}\{p\_\{t\}\(x\_\{t\}\)\\,M\_\{t\}^\{K\}\(x\_\{t\-1\},x\_\{t\}\)\}\(19\)=γt\(xt\)γt−1\(xt−1\)⋅γt\(xt−1\)/Ztγt\(xt\)/Zt\\displaystyle=\\frac\{\\gamma\_\{t\}\(x\_\{t\}\)\}\{\\gamma\_\{t\-1\}\(x\_\{t\-1\}\)\}\\cdot\\frac\{\\gamma\_\{t\}\(x\_\{t\-1\}\)/Z\_\{t\}\}\{\\gamma\_\{t\}\(x\_\{t\}\)/Z\_\{t\}\}\(20\)=γt\(xt−1\)γt−1\(xt−1\)=eΔβt⋅R\(xt−1\)\.\\displaystyle=\\frac\{\\gamma\_\{t\}\(x\_\{t\-1\}\)\}\{\\gamma\_\{t\-1\}\(x\_\{t\-1\}\)\}=e^\{\\Delta\\beta\_\{t\}\\cdot R\(x\_\{t\-1\}\)\}\.\(21\)BothMtKM\_\{t\}^\{K\}andLt−1L\_\{t\-1\}cancel completely, confirming \([8](https://arxiv.org/html/2605.15308#S2.E8)\)\.

## Appendix ERealizing the Forward Kernel with LLM Proposals

Algorithm[1](https://arxiv.org/html/2605.15308#alg1)requires aptp\_\{t\}\-invariant MCMC kernelMtM\_\{t\}at each step\. HereQtQ\_\{t\}is the LLM proposal distribution, while one application ofMtM\_\{t\}is the proposal plus MH accept/reject transition; Algorithm[1](https://arxiv.org/html/2605.15308#alg1)applies this transitionKKtimes per particle\. We now detail how this kernel is realized when the only available sampler is a black\-box LLM\.

#### LLM as proposal distribution\.

At steptt, given a parent programxxselected by resampling, the LLM proposal kernel generates a candidate programx′x^\{\\prime\}by sampling from

x′∼Qt\(⋅∣x,𝒞t\),x^\{\\prime\}\\sim Q\_\{t\}\(\\cdot\\mid x,\\mathcal\{C\}\_\{t\}\),\(22\)where𝒞t\\mathcal\{C\}\_\{t\}denotes the*context*at steptt: a structured collection of information provided to the LLM alongside the parent programxxand taskqq\. This context may include the history of previously evaluated programs with their rewards, high\-performing programs from other particles \(inspirations\), and instructions specifying the edit granularity\. The notationQt\(⋅∣x,𝒞t\)Q\_\{t\}\(\\cdot\\mid x,\\mathcal\{C\}\_\{t\}\)emphasizes that the proposal is implemented by the*same LLM*that defines the priorp0\(⋅∣q\)p\_\{0\}\(\\cdot\\mid q\), but with additional conditioning on the parent and context\. When𝒞t=∅\\mathcal\{C\}\_\{t\}=\\varnothingand no parent is provided, the proposal reduces to the priorp0\(⋅∣q\)p\_\{0\}\(\\cdot\\mid q\)\.

#### The invariance requirement\.

To ensure thatMtM\_\{t\}leavesptp\_\{t\}invariant, the standard approach is Metropolis–Hastings \(MH\): acceptx′x^\{\\prime\}with probability

αt\(x,x′\)=min⁡\{1,pt\(x′\)Qt\(x∣x′,𝒞t\)pt\(x\)Qt\(x′∣x,𝒞t\)\}\.\\alpha\_\{t\}\(x,x^\{\\prime\}\)=\\min\\\!\\bigg\\\{1,\\;\\frac\{p\_\{t\}\(x^\{\\prime\}\)\\,Q\_\{t\}\(x\\mid x^\{\\prime\},\\mathcal\{C\}\_\{t\}\)\}\{p\_\{t\}\(x\)\\,Q\_\{t\}\(x^\{\\prime\}\\mid x,\\mathcal\{C\}\_\{t\}\)\}\\bigg\\\}\.\(23\)Expandingpt\(x\)∝p0\(x∣q\)eβtR\(x\)p\_\{t\}\(x\)\\propto p\_\{0\}\(x\\mid q\)\\,e^\{\\beta\_\{t\}R\(x\)\}, the acceptance ratio becomes

αt\(x,x′\)=min⁡\{1,eβt\(R\(x′\)−R\(x\)\)⋅p0\(x′∣q\)p0\(x∣q\)⋅Qt\(x∣x′,𝒞t\)Qt\(x′∣x,𝒞t\)⏟σt\(x,x′\)\}\.\\alpha\_\{t\}\(x,x^\{\\prime\}\)=\\min\\\!\\bigg\\\{1,\\;e^\{\\beta\_\{t\}\(R\(x^\{\\prime\}\)\-R\(x\)\)\}\\cdot\\underbrace\{\\frac\{p\_\{0\}\(x^\{\\prime\}\\mid q\)\}\{p\_\{0\}\(x\\mid q\)\}\\cdot\\frac\{Q\_\{t\}\(x\\mid x^\{\\prime\},\\mathcal\{C\}\_\{t\}\)\}\{Q\_\{t\}\(x^\{\\prime\}\\mid x,\\mathcal\{C\}\_\{t\}\)\}\}\_\{\\displaystyle\\sigma\_\{t\}\(x,x^\{\\prime\}\)\}\\bigg\\\}\.\(24\)The composite ratioσt\(x,x′\)\\sigma\_\{t\}\(x,x^\{\\prime\}\)captures the total structural asymmetry between forward and reverse transitions\. Computing it requires evaluating both the prior density ratio and the proposal density ratio, both of which are intractable for black\-box LLMs\.

#### From the full MH ratio to a tractable approximation\.

Computingσt\(x,x′\)\\sigma\_\{t\}\(x,x^\{\\prime\}\)requires evaluating the LLM prior ratio and the proposal density ratio, both of which are unavailable through black\-box LLM APIs\. To obtain a tractable acceptance rule, we adopt the reward\-only criterion in[Equation˜10](https://arxiv.org/html/2605.15308#S2.E10)of the main text, which corresponds to substitutingσt\(x,x′\)=1\\sigma\_\{t\}\(x,x^\{\\prime\}\)=1in \([24](https://arxiv.org/html/2605.15308#A5.E24)\)\. While this approximation may cause the mutation kernel to deviate from exactptp\_\{t\}\-invariance, it is motivated by the heuristic that the LLM tends to perform local modifications—changing a few lines, tuning a hyperparameter, refactoring a subroutine\. For such edits, both the prior ratiop0\(x′∣q\)/p0\(x∣q\)p\_\{0\}\(x^\{\\prime\}\\mid q\)/p\_\{0\}\(x\\mid q\)and the proposal ratioQt\(x∣x′,𝒞t\)/Qt\(x′∣x,𝒞t\)Q\_\{t\}\(x\\mid x^\{\\prime\},\\mathcal\{C\}\_\{t\}\)/Q\_\{t\}\(x^\{\\prime\}\\mid x,\\mathcal\{C\}\_\{t\}\)are close to one, and even when each factor deviates from unity their product can stay close to one if the deviations partially cancel\. The high MH acceptance rates observed in[Figure˜3\(b\)](https://arxiv.org/html/2605.15308#S2.F3.sf2)provide indirect empirical evidence that the substitution is a reasonable approximation in practice\.

## Appendix FProof of[Theorem˜3\.1](https://arxiv.org/html/2605.15308#S3.Thmtheorem1)

The proof follows the proof sketch in[Section˜3](https://arxiv.org/html/2605.15308#S3): we state the imported finite\-sample SMC bound of\[[21](https://arxiv.org/html/2605.15308#bib.bib5), Theorem 1 & Sec\. 5\.1\]in its mixing\-time form \(which does not require kernel reversibility\) and then derive the three substitutions \(S1\), \(S2\), \(S3\) that map its abstract quantities to theSMCEvolvevariables\.

#### Warm\-start mixing time\.

For a Markov kernelMMwith invariant distributionμ\\mu, anω\\omega\-warm starting setPω\(μ\)≜\{ν:supBν\(B\)/μ\(B\)≤ω\}P\_\{\\omega\}\(\\mu\)\\triangleq\\\{\\nu:\\sup\_\{B\}\\,\\nu\(B\)/\\mu\(B\)\\leq\\omega\\\}, and an accuracyη∈\(0,1\)\\eta\\in\(0,1\), the warm\-start mixing time ofMMis

τM\(η,ω\)≜min⁡\{K≥1:supν∈Pω\(μ\)‖νMK−μ‖TV≤η\}\.\\tau\_\{M\}\(\\eta,\\omega\)\\;\\triangleq\\;\\min\\\!\\left\\\{K\\geq 1:\\;\\sup\_\{\\nu\\in P\_\{\\omega\}\(\\mu\)\}\\\|\\nu M^\{K\}\-\\mu\\\|\_\{\\mathrm\{TV\}\}\\;\\leq\\;\\eta\\right\\\}\.\(25\)This is Definition \(Sec\. 2\.1\) of\[[21](https://arxiv.org/html/2605.15308#bib.bib5)\]\.

#### Imported bound\[[21](https://arxiv.org/html/2605.15308#bib.bib5), Theorem 1 & Sec\. 5\.1\]\.

Consider an SMC sampler over a geometric\-mixture pathμ0,…,μT=π\\mu\_\{0\},\\ldots,\\mu\_\{T\}=\\piwithT=⌈log⁡Γ⌉T=\\lceil\\log\\Gamma\\rceilandβt=t/T\\beta\_\{t\}=t/T, initialized with iid samples fromμ0\\mu\_\{0\}and applyingKKtransitions of aμt\\mu\_\{t\}\-invariant Markov kernelMtM\_\{t\}at each stagett\. Suppose

1. 1\.the bridge weights satisfysupxwt\(x\)≤W\\sup\_\{x\}w\_\{t\}\(x\)\\leq Wandzt−1/zt≤Zz\_\{t\-1\}/z\_\{t\}\\leq Zfor everytt, withγ≜W⋅Z\\gamma\\triangleq W\\cdot Z\(this is AS1 of[21](https://arxiv.org/html/2605.15308#bib.bib5)\);
2. 2\.there existsτ<∞\\tau<\\inftysuch thatτMt\(η⋆,ω⋆\)≤τ\\tau\_\{M\_\{t\}\}\(\\eta\_\{\\star\},\\omega\_\{\\star\}\)\\leq\\taufor everytt, at the internal precision/warmness level chosen by the analysis \(Theorem 1 of[21](https://arxiv.org/html/2605.15308#bib.bib5)usesω⋆=2\\omega\_\{\\star\}=2andη⋆=1/\(8NT\)\\eta\_\{\\star\}=1/\(8NT\)\);
3. 3\.Γ≜supxdπ/dμ0\(x\)<∞\\Gamma\\triangleq\\sup\_\{x\}d\\pi/d\\mu\_\{0\}\(x\)<\\infty\.

Then for every‖f‖∞≤1\\\|f\\\|\_\{\\infty\}\\leq 1andϵ\>0\\epsilon\>0, there exists a choice of\(N,T,K\)\(N,T,K\)such that the SMC empirical measureηTN\\eta\_\{T\}^\{N\}satisfiesℙ\(\|ηTN\(f\)−π\(f\)\|≤ϵ\)≥3/4\\mathbb\{P\}\(\|\\eta\_\{T\}^\{N\}\(f\)\-\\pi\(f\)\|\\leq\\epsilon\)\\geq 3/4, and the total kernel\-evaluation budget is bounded by

ℬ=NTK=𝒪∗\(\(ϵ−2∨γ2\)⋅τ⋅log⁡Γlog2⁡log⁡Γ\)\.\\mathcal\{B\}\\;=\\;NTK\\;=\\;\\mathcal\{O\}^\{\\ast\}\\\!\\left\(\(\\epsilon^\{\-2\}\\vee\\gamma^\{2\}\)\\cdot\\tau\\cdot\\log\\Gamma\\,\\log^\{2\}\\\!\\log\\Gamma\\right\)\.\(26\)The reversible\-kernel form\[[21](https://arxiv.org/html/2605.15308#bib.bib5), Cor\. 6\.1\]is recovered by the spectral\-gap\-to\-mixing\-time boundτMt\(η,ω\)≤\(1/ρgap\)\[log⁡\(2/η\)\+log⁡\(ω−1\)\]\\tau\_\{M\_\{t\}\}\(\\eta,\\omega\)\\leq\(1/\\rho^\{\\mathrm\{gap\}\}\)\\big\[\\log\(2/\\eta\)\+\\log\(\\omega\-1\)\\big\], which givesτ=𝒪\(1/ρgap\)\\tau=\\mathcal\{O\}\(1/\\rho^\{\\mathrm\{gap\}\}\)up to log factors absorbed into𝒪∗\\mathcal\{O\}^\{\\ast\}\.

#### Setup forSMCEvolve\.

The bridgespt\(x\)∝p0\(x∣q\)eβtR\(x\)p\_\{t\}\(x\)\\propto p\_\{0\}\(x\\mid q\)e^\{\\beta\_\{t\}R\(x\)\}with0=β0<⋯<βT=β0=\\beta\_\{0\}<\\cdots<\\beta\_\{T\}=\\betaform a geometric\-mixture path fromp0\(⋅∣q\)p\_\{0\}\(\\cdot\\mid q\)top∗\(⋅∣q\)p^\{\*\}\(\\cdot\\mid q\)\. The initialization condition holds because particles are iid fromp0\(⋅∣q\)p\_\{0\}\(\\cdot\\mid q\)\([Algorithm˜1](https://arxiv.org/html/2605.15308#alg1), line 2\), and the kernel correctness/mixing condition holds under the kernel hypothesis of[Theorem˜3\.1](https://arxiv.org/html/2605.15308#S3.Thmtheorem1)\. It remains to identify\(γ2,τ,log⁡Γ\)\(\\gamma^\{2\},\\tau,\\log\\Gamma\)in \([26](https://arxiv.org/html/2605.15308#A6.E26)\) withSMCEvolvequantities\. We do so in three steps\.

### \(S1\) Bridge regularityγ2≤κ−1\\gamma^\{2\}\\leq\\kappa^\{\-1\}

The imported bound \([26](https://arxiv.org/html/2605.15308#A6.E26)\) requires a deterministic upper bound on the adjacentL2L^\{2\}\-bridge ratio, which we adopt as part of the hypothesis of[Theorem˜3\.1](https://arxiv.org/html/2605.15308#S3.Thmtheorem1):

‖ptpt−1‖L2\(pt−1\)2≤κ−1,t=1,…,T\.\\left\\\|\\frac\{p\_\{t\}\}\{p\_\{t\-1\}\}\\right\\\|^\{2\}\_\{L^\{2\}\(p\_\{t\-1\}\)\}\\;\\leq\\;\\kappa^\{\-1\},\\qquad t=1,\\dots,T\.\(27\)The parameterization in terms ofκ\\kappais justified by two considerations\.

#### Worst\-case schedule\-dependent bound\.

The geometric reward\-tilted bridge \([6](https://arxiv.org/html/2605.15308#S2.E6)\) satisfies

‖ptpt−1‖∞≤eΔβtR\+pt−1\(eΔβtR\)≤eΔβt\(R\+−R−\)=eΔβtΔR,\\left\\\|\\frac\{p\_\{t\}\}\{p\_\{t\-1\}\}\\right\\\|\_\{\\infty\}\\leq\\frac\{e^\{\\Delta\\beta\_\{t\}R\_\{\+\}\}\}\{p\_\{t\-1\}\(e^\{\\Delta\\beta\_\{t\}R\}\)\}\\;\\leq\\;e^\{\\Delta\\beta\_\{t\}\(R\_\{\+\}\-R\_\{\-\}\)\}\\;=\\;e^\{\\Delta\\beta\_\{t\}\\Delta\_\{R\}\},\(28\)where the second inequality usespt−1\(eΔβtR\)≥eΔβtR−p\_\{t\-1\}\(e^\{\\Delta\\beta\_\{t\}R\}\)\\geq e^\{\\Delta\\beta\_\{t\}R\_\{\-\}\}\. Since theL2L^\{2\}norm is dominated by theL∞L^\{\\infty\}norm, this gives the deterministic certificateκ−1≤e2ΔβmaxΔR\\kappa^\{\-1\}\\leq e^\{2\\Delta\\beta\_\{\\max\}\\Delta\_\{R\}\}, withΔβmax≜maxt⁡Δβt\\Delta\\beta\_\{\\max\}\\triangleq\\max\_\{t\}\\Delta\\beta\_\{t\}\. Short bridges \(smallΔβt\\Delta\\beta\_\{t\}\) imply small bridge ratios\.

#### ESS bisection as a finite\-sample mechanism that targets \([27](https://arxiv.org/html/2605.15308#A6.E27)\)\.

The empirical ESS

ESS^t=\(∑n=1Nwt\(n\)\)2∑n=1N\(wt\(n\)\)2,wt\(n\)=pt\(xt−1\(n\)\)pt−1\(xt−1\(n\)\)\\widehat\{\\mathrm\{ESS\}\}\_\{t\}=\\frac\{\\big\(\\sum\_\{n=1\}^\{N\}w\_\{t\}^\{\(n\)\}\\big\)^\{2\}\}\{\\sum\_\{n=1\}^\{N\}\(w\_\{t\}^\{\(n\)\}\)^\{2\}\},\\qquad w\_\{t\}^\{\(n\)\}=\\frac\{p\_\{t\}\(x\_\{t\-1\}^\{\(n\)\}\)\}\{p\_\{t\-1\}\(x\_\{t\-1\}^\{\(n\)\}\)\}\(29\)is the finite\-sample analogue of‖pt/pt−1‖L2\(pt−1\)−2\\\|p\_\{t\}/p\_\{t\-1\}\\\|^\{\-2\}\_\{L^\{2\}\(p\_\{t\-1\}\)\}: when\{xt−1\(n\)\}n=1N\\\{x\_\{t\-1\}^\{\(n\)\}\\\}\_\{n=1\}^\{N\}is an iid sample frompt−1p\_\{t\-1\}, a law of large numbers givesESS^t/N→‖pt/pt−1‖L2\(pt−1\)−2\\widehat\{\\mathrm\{ESS\}\}\_\{t\}/N\\to\\\|p\_\{t\}/p\_\{t\-1\}\\\|^\{\-2\}\_\{L^\{2\}\(p\_\{t\-1\}\)\}asN→∞N\\to\\infty\. The ESS bisection rule \([11](https://arxiv.org/html/2605.15308#S2.E11)\) is the algorithmic mechanism that targets \([27](https://arxiv.org/html/2605.15308#A6.E27)\) by enforcing its empirical analogueESS^t/N≥κ\\widehat\{\\mathrm\{ESS\}\}\_\{t\}/N\\geq\\kappa\. We do not turn this targeting into a formal derivation, since the SMC particles at staget−1t\-1are not iid frompt−1p\_\{t\-1\}and the schedule\{λt\}t=0T\\\{\\lambda\_\{t\}\\\}\_\{t=0\}^\{T\}is data\-dependent; \([27](https://arxiv.org/html/2605.15308#A6.E27)\) is therefore treated as an explicit assumption \(with worst\-case certificatee2ΔβmaxΔRe^\{2\\Delta\\beta\_\{\\max\}\\Delta\_\{R\}\}\) rather than a consequence of the ESS rule\.

### \(S2\) Mixing\-time boundτ≤1/\(1−ρ\)\\tau\\leq 1/\(1\-\\rho\)via uniform ergodicity

The kernel hypothesis of[Theorem˜3\.1](https://arxiv.org/html/2605.15308#S3.Thmtheorem1)provides uniform ergodicitysupx‖MtK\(x,⋅\)−pt‖TV≤CρK\\sup\_\{x\}\\\|M\_\{t\}^\{K\}\(x,\\cdot\)\-p\_\{t\}\\\|\_\{\\mathrm\{TV\}\}\\leq C\\rho^\{K\}with rateρ∈\(0,1\)\\rho\\in\(0,1\)\. We show that this directly yields a warm\-start mixing\-time bound that feeds into \([26](https://arxiv.org/html/2605.15308#A6.E26)\) without any reversibility hypothesis\. The argument is in four steps\.

#### \(i\) TV convexity in the first argument\.

For any starting distributionν\\nuand any measurableAA, sinceν\\nuis a probability measure,

νMtK\(A\)−pt\(A\)=∫\[MtK\(x,A\)−pt\(A\)\]ν\(dx\)\.\\nu M\_\{t\}^\{K\}\(A\)\-p\_\{t\}\(A\)\\;=\\;\\int\[M\_\{t\}^\{K\}\(x,A\)\-p\_\{t\}\(A\)\]\\,\\nu\(dx\)\.Taking absolute values, thensupA\\sup\_\{A\}, and exchangingsupA∫≤∫supA\\sup\_\{A\}\\int\\leq\\int\\sup\_\{A\}gives

‖νMtK−pt‖TV≤∫‖δxMtK−pt‖TVν\(dx\)≤supx‖δxMtK−pt‖TV≤CρK\.\\\|\\nu M\_\{t\}^\{K\}\-p\_\{t\}\\\|\_\{\\mathrm\{TV\}\}\\;\\leq\\;\\int\\\|\\delta\_\{x\}M\_\{t\}^\{K\}\-p\_\{t\}\\\|\_\{\\mathrm\{TV\}\}\\,\\nu\(dx\)\\;\\leq\\;\\sup\_\{x\}\\\|\\delta\_\{x\}M\_\{t\}^\{K\}\-p\_\{t\}\\\|\_\{\\mathrm\{TV\}\}\\;\\leq\\;C\\rho^\{K\}\.In particular this holds for everyω\\omega\-warmν∈Pω\(pt\)\\nu\\in P\_\{\\omega\}\(p\_\{t\}\)\.

#### \(ii\) Inverting\.

SettingCρK≤ηC\\rho^\{K\}\\leq\\etayieldsK≥log⁡\(C/η\)/log⁡\(1/ρ\)K\\geq\\log\(C/\\eta\)/\\log\(1/\\rho\), so the warm\-start mixing time ofMtM\_\{t\}is bounded by

τMt\(η,ω\)≤log⁡\(C/η\)log⁡\(1/ρ\),∀ω≥1,η∈\(0,1\)\.\\tau\_\{M\_\{t\}\}\(\\eta,\\omega\)\\;\\leq\\;\\frac\{\\log\(C/\\eta\)\}\{\\log\(1/\\rho\)\},\\qquad\\forall\\,\\omega\\geq 1,\\;\\eta\\in\(0,1\)\.\(30\)

#### \(iii\) Elementary inequality\.

Forρ∈\(0,1\)\\rho\\in\(0,1\),log⁡\(1/ρ\)≥1−ρ\\log\(1/\\rho\)\\geq 1\-\\rho: the functionf\(ρ\)=−log⁡ρ−\(1−ρ\)f\(\\rho\)=\-\\log\\rho\-\(1\-\\rho\)hasf\(1\)=0f\(1\)=0andf′\(ρ\)=−1/ρ\+1<0f^\{\\prime\}\(\\rho\)=\-1/\\rho\+1<0on\(0,1\)\(0,1\), sof≥0f\\geq 0on\(0,1\)\(0,1\)\. Combining with \([30](https://arxiv.org/html/2605.15308#A6.E30)\) gives

τMt\(η,ω\)≤log⁡\(C/η\)1−ρ\.\\tau\_\{M\_\{t\}\}\(\\eta,\\omega\)\\;\\leq\\;\\frac\{\\log\(C/\\eta\)\}\{1\-\\rho\}\.\(31\)

#### \(iv\) Substitution into \([26](https://arxiv.org/html/2605.15308#A6.E26)\)\.

Theorem 1 of\[[21](https://arxiv.org/html/2605.15308#bib.bib5)\]sets\(η⋆,ω⋆\)=\(1/\(8NT\),2\)\(\\eta\_\{\\star\},\\omega\_\{\\star\}\)=\(1/\(8NT\),2\), so the mixing\-time factor in \([26](https://arxiv.org/html/2605.15308#A6.E26)\) satisfies

τ=maxt⁡τMt\(1/\(8NT\),2\)≤log⁡\(8NTC\)1−ρ=𝒪\(log⁡\(NT\)\)1−ρ\.\\tau\\;=\\;\\max\_\{t\}\\tau\_\{M\_\{t\}\}\(1/\(8NT\),2\)\\;\\leq\\;\\frac\{\\log\(8NTC\)\}\{1\-\\rho\}\\;=\\;\\frac\{\\mathcal\{O\}\(\\log\(NT\)\)\}\{1\-\\rho\}\.The𝒪\(log⁡\(NT\)\)\\mathcal\{O\}\(\\log\(NT\)\)factor is polylogarithmic and is absorbed into𝒪~\\widetilde\{\\mathcal\{O\}\}in the SMCEvolve rate \([13](https://arxiv.org/html/2605.15308#S3.E13)\)\.

### \(S3\)log⁡Γ≤βΔR\\log\\Gamma\\leq\\beta\\Delta\_\{R\}via reward oscillation

###### Lemma F\.1\(Path length for reward tilting\)\.

Under the kernel hypothesis of[Theorem˜3\.1](https://arxiv.org/html/2605.15308#S3.Thmtheorem1), supposeRRis bounded on the support ofp0\(⋅∣q\)p\_\{0\}\(\\cdot\\mid q\)with oscillationΔR\\Delta\_\{R\}\. Thenlog⁡Γ≤βΔR\\log\\Gamma\\leq\\beta\\Delta\_\{R\}\.

###### Proof\.

Sincep∗\(x∣q\)=Z\(q\)−1p0\(x∣q\)eβR\(x\)p^\{\*\}\(x\\mid q\)=Z\(q\)^\{\-1\}p\_\{0\}\(x\\mid q\)e^\{\\beta R\(x\)\},

Γ=supx:p0\(x∣q\)\>0p∗\(x∣q\)p0\(x∣q\)=supx:p0\(x∣q\)\>0eβR\(x\)Z\(q\)≤eβR\+Z\(q\)\.\\Gamma=\\sup\_\{x:p\_\{0\}\(x\\mid q\)\>0\}\\frac\{p^\{\*\}\(x\\mid q\)\}\{p\_\{0\}\(x\\mid q\)\}=\\sup\_\{x:p\_\{0\}\(x\\mid q\)\>0\}\\frac\{e^\{\\beta R\(x\)\}\}\{Z\(q\)\}\\leq\\frac\{e^\{\\beta R\_\{\+\}\}\}\{Z\(q\)\}\.Moreover,

Z\(q\)=𝔼p0⁡\[eβR\]≥eβR−Z\(q\)=\\operatorname\{\\mathbb\{E\}\}\_\{p\_\{0\}\}\[e^\{\\beta R\}\]\\;\\geq\\;e^\{\\beta R\_\{\-\}\}becauseR\(x\)≥R−R\(x\)\\geq R\_\{\-\}on the support ofp0p\_\{0\}\. HenceΓ≤eβ\(R\+−R−\)=eβΔR\\Gamma\\leq e^\{\\beta\(R\_\{\+\}\-R\_\{\-\}\)\}=e^\{\\beta\\Delta\_\{R\}\}, which giveslog⁡Γ≤βΔR\\log\\Gamma\\leq\\beta\\Delta\_\{R\}\. ∎

#### Combining \(S1\)–\(S3\)\.

Substituting \([27](https://arxiv.org/html/2605.15308#A6.E27)\), \([31](https://arxiv.org/html/2605.15308#A6.E31)\), and[Appendix˜F](https://arxiv.org/html/2605.15308#A6.SSx3)into the imported bound \([26](https://arxiv.org/html/2605.15308#A6.E26)\):

ℬ=𝒪∗\(ϵ−2∨κ−11−ρlog⁡Γlog2⁡log⁡Γ\)⊆𝒪~\(ϵ−2∨κ−11−ρβΔR\),\\mathcal\{B\}\\;=\\;\\mathcal\{O\}^\{\\ast\}\\\!\\left\(\\frac\{\\epsilon^\{\-2\}\\vee\\kappa^\{\-1\}\}\{1\-\\rho\}\\,\\log\\Gamma\\,\\log^\{2\}\\\!\\log\\Gamma\\right\)\\;\\subseteq\\;\\widetilde\{\\mathcal\{O\}\}\\\!\\left\(\\frac\{\\epsilon^\{\-2\}\\vee\\kappa^\{\-1\}\}\{1\-\\rho\}\\,\\beta\\Delta\_\{R\}\\right\),where the iterated\-logarithm factors are polylogarithmic inβΔR\\beta\\Delta\_\{R\}and absorbed into𝒪~\\widetilde\{\\mathcal\{O\}\}\. The concentration statementℙ\(\|ηTN\(f\)−p∗\(f\)\|≤ϵ\)≥3/4\\mathbb\{P\}\(\|\\eta\_\{T\}^\{N\}\(f\)\-p^\{\*\}\(f\)\|\\leq\\epsilon\)\\geq 3/4is inherited directly from the imported bound\. This is exactly \([12](https://arxiv.org/html/2605.15308#S3.E12)\)–\([13](https://arxiv.org/html/2605.15308#S3.E13)\) in[Theorem˜3\.1](https://arxiv.org/html/2605.15308#S3.Thmtheorem1)\. ∎

### F\.1Per\-factor requirements and degenerate regimes

[Theorem˜3\.1](https://arxiv.org/html/2605.15308#S3.Thmtheorem1)bounds the joint LLM\-call budgetℬ=NTK\\mathcal\{B\}=NTK, but each factor must individually be large enough for the SMC sampler to function\. Concretely, the theorem requires the particle count to satisfyN≳κ−1∨ϵ−2N\\gtrsim\\kappa^\{\-1\}\\vee\\epsilon^\{\-2\}, so the ESS constraint controls the bridge\-ratio variance and supports the Monte Carlo errorϵ\\epsilon, and the mutation count to satisfyK≳log⁡\(1/ϵ\)/\(1−ρ\)K\\gtrsim\\log\(1/\\epsilon\)/\(1\-\\rho\), so the uniform\-ergodicity TV errorCρKC\\rho^\{K\}falls belowϵ\\epsilonat every bridge; the annealing lengthT=𝒪~\(βΔR\)T=\\widetilde\{\\mathcal\{O\}\}\(\\beta\\Delta\_\{R\}\)is set adaptively by the ESS bisection rule and is not a free hyperparameter\. The extreme casesN=1N=1\(no resampling diversity\) andK=1K=1\(a single MH step per particle, insufficient mixing wheneverρ\\rhois close to11\) sit outside this regime, and the formal guarantee no longer applies\.

## Appendix GImplementation Details

### G\.1Island\-Based Parallelism

SMCEvolverunsIIindependent SMC chains \(*islands*\), each withNNparticles and its own temperature schedule\. Everyτ\\tauepochs, islands exchange top\-performing particles via a migration step: each island sends its bestmmparticles to a randomly assigned neighbor and retains the topNNafter merging\. This maintains diversity across subpopulations while enabling parallel execution\.

### G\.2Resampling and Numerical Stability

We use*systematic resampling*\[[2](https://arxiv.org/html/2605.15308#bib.bib24)\], which requires only a single uniform random number and is computationally convenient in practice; see\[[9](https://arxiv.org/html/2605.15308#bib.bib25)\]for a comparison with the stratified variant of\[[16](https://arxiv.org/html/2605.15308#bib.bib21)\]and a variance analysis\. For numerical stability, the log\-weightslog⁡wt\(n\)=Δβt⋅R\(xt−1\(n\)\)\\log w\_\{t\}^\{\(n\)\}=\\Delta\\beta\_\{t\}\\cdot R\(x\_\{t\-1\}^\{\(n\)\}\)are shifted by their maximum before exponentiation \(the log\-sum\-exp trick\), preventing overflow without affecting the normalized weights \([9](https://arxiv.org/html/2605.15308#S2.E9)\)\.

### G\.3Summary of Theoretical Guarantees

[Table˜5](https://arxiv.org/html/2605.15308#A7.T5)summarizes how each component ofSMCEvolvemaps to the SMC framework and which guarantees are preserved\.

Table 5:Mapping between the SMC framework and theSMCEvolveimplementation\.Exact: the theoretical guarantee is preserved\.Approx\.: a practical relaxation is introduced\.The target distribution, annealing path, and importance weights remain exact\. Mutation heuristics, adaptive kernel selection, adaptive temperature schedules, and island\-based parallel extensions enter the theory only through additional approximation terms or extra assumptions beyond the main theorem section\.

## Appendix HTask Details

This appendix collects natural\-language descriptions of every benchmark task evaluated in Tables[1](https://arxiv.org/html/2605.15308#S4.T1)–[2](https://arxiv.org/html/2605.15308#S4.T2)and Figure[4](https://arxiv.org/html/2605.15308#S4.F4)\. Formal interfaces \(input/output signatures, evaluators\) are documented in the per\-tasktask\.mdfiles released with our code\.

### H\.1Math domain \(Table[1](https://arxiv.org/html/2605.15308#S4.T1)\)

Ten open problems from analytic combinatorics and discrete geometry, ported from the public release accompanying DeepMind’s AlphaEvolve\. Each task asks the program to*construct*a concrete mathematical object \(a packing, a point set, or a step function\) that pushes a known numerical constant in the direction predicted by theory; a score of1\.01\.0ties the 2024 AlphaEvolve benchmark, and scores above1\.01\.0strictly improve on it\.

Table 6:Math domain — AlphaEvolve combinatorial geometry tasks\.
### H\.2Symbolic Regression domain \(Table[3](https://arxiv.org/html/2605.15308#S4.T3)\)

Sixteen scientific\-equation discovery tasks \(4 splits×\\times4 tasks\) drawn from LLM\-SRBench\[[30](https://arxiv.org/html/2605.15308#bib.bib16)\]\. Each task hides a ground\-truth analytical expression behind a small numeric dataset; the program must propose a closed\-form parametric expression \(with up to ten free coefficients\) whose mean\-squared error against the held\-out training data is as small as possible\. The reward−log10⁡\(MSE\)\-\\log\_\{10\}\(\\mathrm\{MSE\}\)rewards exponentially\-better fits, so a one\-decade improvement in MSE counts as\+1\.0\+1\.0in score\.

Table 7:Symbolic Regression domain — four LLM\-SRBench splits, four tasks per split\.
### H\.3AlgoTune domain \(Table[2](https://arxiv.org/html/2605.15308#S4.T2)\)

Eight numerical, signal\-processing, and linear\-algebra tasks from the public AlgoTune benchmark\[[25](https://arxiv.org/html/2605.15308#bib.bib26)\]\. Each task ships with a known*correct*reference implementation \(typically a thin wrapper around SciPy or LAPACK\); the program’s job is to produce a drop\-in replacement that is measurably faster on the benchmark’s input distribution while still passing AlgoTune’s correctness verifier\. The reward is the wall\-clock speedup over the reference \(so1\.01\.0==no speedup,10\.010\.0==ten\-times faster\); any incorrect output scores0\.

Table 8:AlgoTune domain — runtime\-optimization tasks\.
### H\.4AutoResearch domain \(Figure[4](https://arxiv.org/html/2605.15308#S4.F4)\)

A single high\-stakes systems task ported fromkarpathy/autoresearch\[[14](https://arxiv.org/html/2605.15308#bib.bib4)\]\. The program under evolution is a*complete*single\-file GPT pretraining script: model architecture, tokenizer setup, optimizer, learning\-rate schedule, batching, training loop, and final evaluation\. Each candidate trains for 60 s of wall\-clock time on a single RTX A5000 \(24 GB\) on the climbmix corpus, after which the script must reportval\_bpb\(bits\-per\-byte on a held\-out shard\)\. Lowerval\_bpbmeans a better language model; the rewardmax⁡\(0,2\.0−val\_bpb\)\\max\(0,\\,2\.0\-\\texttt\{val\\\_bpb\}\)is anchored so that a random baseline \(val\_bpb≈2\.6\\texttt\{val\\\_bpb\}\\approx 2\.6\) scores0and a well\-trained small GPT \(val\_bpb≈1\.0\\texttt\{val\\\_bpb\}\\approx 1\.0\) scores about1\.01\.0\.

Table 9:AutoResearch domain — single\-GPU GPT pretraining task\.

## Appendix IExisting Assets and Licenses

We use existing assets only as benchmark and evaluation resources\. The math tasks are ported from the public AlphaEvolve benchmark release; the symbolic\-regression tasks are drawn from LLM\-SRBench; the numerical optimization tasks are drawn from AlgoTune; and the AutoResearch task uses the TinyStories/climbmix pretraining setup fromkarpathy/autoresearch\. We credit the original sources in the main text and task descriptions\. We do not redistribute third\-party datasets, pretrained models, or benchmark code as new assets; our code release will include attribution, links, and the applicable license or terms\-of\-use information for each external resource\. Commercial LLM APIs are accessed under the providers’ terms, and users reproducing the experiments must supply their own API credentials\.

## Appendix JHyperparameters

[Table˜10](https://arxiv.org/html/2605.15308#A10.T10)lists the hyperparameters that materially affectSMCEvolve’s behaviour, together with the values used in our experiments\. The compute\-budget group sets the LLM\-call ceiling≤nislands×N×K×Tmax\\leq n\_\{\\text\{islands\}\}\\\!\\times\\\!N\\\!\\times\\\!K\\\!\\times\\\!T\_\{\\max\}; the SMC\-annealing group controls the adaptive temperature schedule; the remaining groups govern mixing, mutation, and LLM behaviour\.

Table 10:SMCEvolvehyperparameters\.ParameterValueMeaningCompute budgetn\_islands22Number of parallel islands\.particles\_per\_island88ParticlesNNper island\.n\_proposals22KKsequential LLM proposals per particle per step, used as aKK\-step MH chain\.min\_iterations33Lower bound on SMC steps; also capsΔλ≤1/min\_iterations\\Delta\\lambda\\\!\\leq\\\!1/\\text\{min\\\_iterations\}\.max\_iterations1515Hard upper boundTmaxT\_\{\\max\}on SMC steps\.SMC annealingbeta2020Target inverse temperatureβ\\beta\(annealing endpoint\)\.kappa0\.90\.9ESS threshold used by bisection to pickλt\\lambda\_\{t\}\.Island migrationmigration\_interval33SMC steps between successive migration events\.migration\_size11Particles exchanged per migration event\.Mutation kernelkernel\_selectionadaptiveThompson\-sampling mixture over the2×22\{\\times\}2kernel grid \(diff/rewrite×\\timeswith/without inspirations\)\.top\_k\_inspiration22Highest\-reward exemplars from the same island\.diverse\_inspirations22MAP\-Elites\-style diversity exemplars \(embedding\-distance\)\.LLM backendtemperature1\.01\.0LLM sampling temperature\.max\_tokens40964096Maximum tokens per LLM response\.models50/50Ensemble ofgpt\-5\-miniandgemini\-3\-flash\(gpt\-5\.4\+gemini\-3\-profor AutoResearch\)\.
## Appendix KBroader Impact

SMCEvolveis intended as a research framework for LLM\-driven program search\. Its main potential benefit is to make automated scientific discovery and program optimization more sample\-efficient and easier to monitor through explicit resampling and stopping criteria\. Our experiments are conducted on benchmark tasks and do not involve human subjects or sensitive data\. As with any automated code\-generation system, generated candidates should be treated as suggestions and validated by task\-specific tests before use, especially outside controlled benchmarks\. The method also relies on access to external LLM APIs, which may affect cost and reproducibility across users\.

## Appendix LMutation Kernel Prompts

SMCEvolvemutates particles through a2×22\{\\times\}2grid of LLM prompt templates that varies along two axes:

At each call, the system prompt fixes the kernel’s role and the expected output format; the user prompt injects the parent program \(rendered with the language tag\{\{language\}\}into\{\{code\_content\}\}\), its reward \(\{\{performance\_metrics\}\}\), and — forwith\_inspokernels — a block of reference programs \(\{\{inspiration\_section\}\}\)\. When no reference programs are available \(e\.g\. at initialization\),with\_inspokernels fall back to theirno\_inspocounterpart\.

### L\.1diff\_no\_inspo: single\-particle local kernel

`System Prompt User Prompt`

`L\.2 diff\_with\_inspo: interacting local kernel System Prompt User Prompt L\.3 rewrite\_no\_inspo: single\-particle global kernel System Prompt User Prompt L\.4 rewrite\_with\_inspo: interacting global kernel \(crossover\) System Prompt User Prompt Appendix M Visualizing an SMCEvolve Run Figure˜5 traces a real run of SMCEvolve on the Circle Packing in a Rectangle \(N=21N\{=\}21\) task with I=2I\{=\}2 islands, N=8N\{=\}8 particles per island, and K=2K\{=\}2 MH proposals per particle\. Each panel shows one SMC iteration on a single island, organized into the three sub\-steps of the kernel: reweight \(top\), resample \(middle\), and mutate \(bottom\)\. In the reweight row, node area is proportional to the normalized importance weight wi/maxj⁡wjw\_\{i\}/\\max\_\{j\}w\_\{j\}, and a thick colored ring marks particles that arrived via inter\-island migration in the previous epoch\. Resample arrows depict the multinomial replication map: arrows converging on a single parent indicate replication of a high\-weight ancestor, while parents with no outgoing arrow are discarded\. The mutate block stacks one row per MH proposal, with vertical dashed lines linking the successive proposals of the same particle\. Node fill color encodes reward through a global red–yellow–green colormap normalized over the full run\. On mutate nodes we additionally encode three pieces of kernel metadata: stroke color distinguishes the edit operator \(blue == local diff edit, orange == full\-program rewrite\); stroke width marks whether the proposal was conditioned on inspiration programs \(thick\) or not \(thin\); and line style with fill opacity encodes the MH outcome \(solid/opaque == accepted, dashed/translucent == rejected with the fill reverting to the parent reward, dotted/faded == skipped\)\. The header above each panel reports the annealing schedule statistics \(λt\\lambda\_\{t\}, Δβt\\Delta\\beta\_\{t\}, βt\\beta\_\{t\}, ESS at λt\\lambda\_\{t\}\) together with the best and mean reward of the population, so that the kernel\-induced dynamics can be cross\-referenced with the adaptive temperature schedule at a glance\. Figure 5: Per\-iteration SMC flow on Circle Packing N=21N\{=\}21 \(I=2I\{=\}2 islands, N=8N\{=\}8 particles per island, K=2K\{=\}2 MH proposals\)\. Rows of each panel correspond to reweight, resample, and mutate; node fill encodes reward \(red–yellow–green\); thick rings in the reweight row mark inter\-island migrants; resample arrows show the multinomial replication map; on mutate nodes, stroke color distinguishes the edit operator \(blue == local diff, orange == full rewrite\), stroke width flags inspiration\-conditioned proposals, and line style/opacity encodes the MH outcome \(solid == accepted, dashed == rejected, dotted == skipped\)\. Panel headers list the annealing schedule \(λt,Δβt,βt,ESS\\lambda\_\{t\},\\Delta\\beta\_\{t\},\\beta\_\{t\},\\mathrm\{ESS\}\) and the population best/mean reward\.`
SMCEvolve: Principled Scientific Discovery via Sequential Monte Carlo Evolution

Similar Articles

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

AlgoEvolve: LLM-driven Meta-evolution of Algorithmic Trading Programs

EvoSci: A Bio-Inspired Multi-Agent Framework for the Evolution of Scientific Discovery

EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery

PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

Submit Feedback

Similar Articles

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery
AlgoEvolve: LLM-driven Meta-evolution of Algorithmic Trading Programs
EvoSci: A Bio-Inspired Multi-Agent Framework for the Evolution of Scientific Discovery
EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery
PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents