Christoffel-DPS: Optimal sensor placement in diffusion posterior sampling for arbitrary distributions

arXiv cs.LG 05/11/26, 04:00 AM Papers
Summary
This paper introduces Christoffel-DPS, a distribution-free framework for optimal sensor placement in diffusion posterior sampling that outperforms classical Gaussian-based methods. It provides theoretical guarantees and practical improvements for reconstructing states from complex, non-Gaussian distributions using generative models.
arXiv:2605.06861v1 Announce Type: new Abstract: State estimation is a critical task in scientific, engineering and control applications. Since the reliability of reconstructions depends on the number and position of sensors, optimal sensor placement (OSP) is essential in scenarios where measurements are sparse and expensive. Classical OSP approaches rely on Gaussian assumptions and are consequently unable to account for the complex distributions encountered in many real-world systems. Generative-model-based reconstruction using sensor guided diffusion posterior sampling (DPS) has emerged as a promising technique for reconstructing states from highly complex distributions. However, existing sensor-selection methods either require unrealistically many sensors or emulate classical OSP, creating a mismatch between modern recovery models with classical OSP tools motivating the need for fundamentally new ideas towards OSP that match the recent advances made in powerful recovery models. We introduce a distribution-free sensor placement framework based on the Christoffel function: a mathematical formulation of optimal sampling and recovery guarantees for posterior sampling with arbitrary sensors and signal distributions, from which we derive a new OSP strategy with non-asymptotic bounds on the number of sensors needed for recovery. We develop Christoffel-DPS, with offline and online variants, instantiating Christoffel sampling for generative models. Christoffel-DPS outperforms Gaussian OSP baselines and existing generative-model placement methods, validating that distribution-free sensing is both theoretically principled and practically superior. The framework is model-agnostic; we demonstrate its application to a range of unconditional DPS and flow-matching models on structurally non-Gaussian benchmarks, showing the efficacy of Christoffel-DPS in low sensor budget regimes.
Original Article Export to Word Export to PDF
View Cached Full Text
Cached at: 05/11/26, 06:57 AM
# Christoffel-DPS: Optimal sensor placement in diffusion posterior sampling for arbitrary distributions
Source: [https://arxiv.org/html/2605.06861](https://arxiv.org/html/2605.06861)
James Rowbottom Department of Applied Mathematics and Theoretical Physics University of Cambridge, UK jr908@cam\.ac\.uk&Nick Huang11footnotemark:1 Department of Mathematics Simon Fraser University Canada nick\_huang@sfu\.ca&Carola\-Bibiane Schönlieb Department of Applied Mathematics and Theoretical Physics University of Cambridge, UK cbs31@cam\.ac\.uk&Ben Adcock Department of Mathematics Simon Fraser University Canada ben\_adcock@sfu\.ca

###### Abstract

State estimation is a critical task in scientific, engineering and control applications\. Since the reliability of reconstructions depends on the number and position of sensors, optimal sensor placement \(OSP\) is essential in scenarios where measurements are sparse and expensive\. Classical OSP approaches rely on Gaussian assumptions and are consequently unable to account for the complex distributions encountered in many real\-world systems\. Generative\-model\-based reconstruction using sensor guided diffusion posterior sampling \(DPS\) has emerged as a promising technique for reconstructing states from highly complex distributions\. However, existing sensor\-selection methods either require unrealistically many sensors or emulate classical OSP, creating a mismatch between modern recovery models with classical OSP tools motivating the need for fundamentally new ideas towards OSP that match the recent advances made in powerful recovery models\. We introduce a distribution\-free sensor placement framework based on the Christoffel function: a mathematical formulation of optimal sampling and recovery guarantees for posterior sampling with arbitrary sensors and signal distributions, from which we derive a new OSP strategy with non\-asymptotic bounds on the number of sensors needed for recovery\. We develop Christoffel\-DPS, with offline and online variants, instantiating Christoffel sampling for generative models\. Christoffel\-DPS outperforms Gaussian OSP baselines and existing generative\-model placement methods, validating that distribution\-free sensing is both theoretically principled and practically superior\. The framework is model\-agnostic; we demonstrate its application to a range of unconditional DPS and flow\-matching models on structurally non\-Gaussian benchmarks, showing the efficacy of Christoffel\-DPS in low sensor budget regimes\.

## 1Introduction

State reconstruction from sparse measurements is an important problem across science and engineering\. Applications include ocean state estimation from networks of buoys, atmospheric data assimilation from weather stations, fluid\-flow reconstruction around aerodynamic surfaces from pressure taps, seismic imaging from receiver arrays and biomedical imaging from compressive measurements\. In such settings, sensors are expensive to deploy and operate, the signal lives in a high\- or infinite\-dimensional state space, and the number and location of sensors can be the dominant factors controlling reconstruction accuracy\. Optimal sensor placement \(OSP\) asks how to choose these measurements so as to maximize reconstruction accuracy under a prescribed sensing budget\.

Classical approaches to OSP are closely tied to linear recovery methods, reduced\-order modelling and Gaussian assumptions\. In Bayesian optimal experimental design \(OED\), standard A\-, D\- and E\-optimality criteria optimize functionals of the posterior covariance matrix\. Typical methods, including gappy POD\(Everson and Sirovich,[1995](https://arxiv.org/html/2605.06861#bib.bib18); Willcox,[2006](https://arxiv.org/html/2605.06861#bib.bib47)\)Gaussian\-process and kriging regression\(Krauseet al\.,[2008](https://arxiv.org/html/2605.06861#bib.bib29)\), ensemble Kalman filtering, and reduced\-order POD\-Galerkin, DEIM/QDEIM models\(Chaturantabut and Sorensen,[2010](https://arxiv.org/html/2605.06861#bib.bib10); Drmač and Gugercin,[2016](https://arxiv.org/html/2605.06861#bib.bib15)\)and SSPOR\(Manoharet al\.,[2018](https://arxiv.org/html/2605.06861#bib.bib35)\), choose sensors so that a reduced basis is well conditioned when restricted to the sensor measurements\. These methods have been highly successful and often come with algorithmic and theoretical guarantees\. However, their placement criteria are typically derived from linearity and Gaussian assumptions, and as a result they do not directly exploit the geometry of complex signal distributions such as multi\-modal, strongly non\-Gaussian or manifold\-supported priors\.

This limitation has become increasingly important with the rapid growth of learned recovery operators\. Supervised inverse maps from sparse measurements to full fields are the natural starting point: Voronoi\-tessellation CNNs\(Fukamiet al\.,[2021](https://arxiv.org/html/2605.06861#bib.bib19)\), transformer reconstructors such as the Senseiver\(Santoset al\.,[2023](https://arxiv.org/html/2605.06861#bib.bib42)\)and the Energy Transformer\(Zhanget al\.,[2025](https://arxiv.org/html/2605.06861#bib.bib50)\), DeepONet\-style operator networks\(Dang and Nguyen,[2025](https://arxiv.org/html/2605.06861#bib.bib13)\), and graph\-transformer reconstructors on unstructured meshes\(Duthéet al\.,[2025](https://arxiv.org/html/2605.06861#bib.bib16)\)\. Generative reconstructors take a probabilistic view: PhySense\(Maet al\.,[2025](https://arxiv.org/html/2605.06861#bib.bib34)\)trains a conditional flow\-matching reconstructor on randomised sensor layouts and optimises placement post\-hoc by projected gradient descent against a reconstruction loss; SDIFT\(Chenet al\.,[2025](https://arxiv.org/html/2605.06861#bib.bib11)\)runs sequential diffusion in a learned functional\-Tucker latent with message\-passing posterior sampling for irregular sparse observations; DiffusionPDE\(Huanget al\.,[2024](https://arxiv.org/html/2605.06861#bib.bib22)\)and FunDPS keep an unconditional diffusion prior and add measurement and PDE\-residual guidance at sampling time, with ConFIG\(Amorós\-Trepatet al\.,[2026](https://arxiv.org/html/2605.06861#bib.bib5)\)introducing conflict\-free gradient projections to stabilise multi\-objective guidance, DDO\(Limet al\.,[2023](https://arxiv.org/html/2605.06861#bib.bib31)\)working directly in function space via Cameron–Martin geometry, and DDIS\(Linet al\.,[2026](https://arxiv.org/html/2605.06861#bib.bib74)\)decoupling joint\-state training by enforcing the PDE constraint with a separately learned operator at sampling time\.

Overall, these models are far from the setting of classical OSP\. Generative models, and in particular diffusion models trained on full\-state samples\(Karraset al\.,[2022](https://arxiv.org/html/2605.06861#bib.bib26)\), encode highly expressive non\-Gaussian priors and admit recovery through techniques such as diffusion posterior sampling \(DPS\)\(Chunget al\.,[2023](https://arxiv.org/html/2605.06861#bib.bib70)\)and various others\(Baldassariet al\.,[2026](https://arxiv.org/html/2605.06861#bib.bib66)\)\. In the limit of a perfectly trained denoiser the prior acts, for our purposes, as an oracle, concentrating probability mass on the manifold of plausible states and rendering the posterior highly non\-Gaussian\. Despite this shift, OSP for generative recovery operators is comparatively under\-developed\. PhySense’s projected\-gradient stage\(Maet al\.,[2025](https://arxiv.org/html/2605.06861#bib.bib34)\)is the only fully end\-to\-end placement loop, but its reconstruction\-loss objective is mean\-squared and its sensor optimisation reverts to a Gaussian\-likelihood treatment under its modelling assumptions, rendering it essentially equivalent to A\-optimal design\. Ensemble\-based approaches such as the per\-pixel standard\-deviation score ofChakrabortyet al\.\([2026](https://arxiv.org/html/2605.06861#bib.bib8)\), the gradient\-weighted class\-activation map ofXuet al\.\([2024](https://arxiv.org/html/2605.06861#bib.bib48)\), the cartoonist\-style uncertainty ofKarczewskiet al\.\([2024](https://arxiv.org/html/2605.06861#bib.bib24)\), and the EnKF\-driven neural network ofDenget al\.\([2021](https://arxiv.org/html/2605.06861#bib.bib14)\)are also A\-optimal in spirit, scoring sensors by a second\-moment summary of an ensemble; attention\-based placement\(Zhaoet al\.,[2025](https://arxiv.org/html/2605.06861#bib.bib51)\)and the curriculum reformulation ofMarcatoet al\.\([2024](https://arxiv.org/html/2605.06861#bib.bib36)\)fall in the same broadly Gaussian family\. None are distribution\-free in the formal sense, and none come with non\-asymptotic recovery guarantees that match the expressivity of the underlying generative prior\.

This paper aims to close this gap\. We introduce a theoretical framework for OSP which provides non\-asymptotic recovery guarantees for arbitrary signal distributions\. Using this, we then derive a novel strategy,Christoffel\-DPS, for OSP in the setting of DPS\. Our specific contributions are:

1. \(i\)A novel theoretical framework for recovery witharbitrary\(in particular, generative\) priors that does not require Gaussian assumptions and applies toanylinear measurements\.
2. \(ii\)A novel OSP strategy,Christoffel sampling, that is theoretically optimal foranygiven prior\.
3. \(iii\)A practical implementation of Christoffel sampling in DPS, termedChristoffel\-DPS, with both offline and online variants, for full state reconstruction from point measurements\.
4. \(iv\)Experiments on a series of scientific datasets showing the benefit of Christoffel\-DPS over other OSP strategies for generative priors\. A typical experimental result is shown in Figure[1](https://arxiv.org/html/2605.06861#S1.F1)\.

Our OSP strategy is fundamentally different to classical OED criteria\. Where A\-, D\- and E\-optimal designs minimize the average size of the posterior covariance ellipsoid – in other words, they consider how strongly a measurement set collapses the posterior towards a mean state – Christoffel sampling controls the worst\-case ratio between measurement energy and signal energy across the secant set of the support of the prior\. It therefore asks how reliably a measurement set*identifies*between arbitrary candidate signals\. The latter is well\-defined for arbitrary distributions and does not require linearity or Gaussian assumptions\. The shift in objective, from posterior collapse to identifiability under noise, is what allows our framework to remain meaningful when the posterior is non\-Gaussian, multi\-modal or supported on a learned manifold\. Christoffel functions have recently emerged as powerful tools in deterministic \(non\-Bayesian\) recovery, using both linearCohen and Migliorati \([2017](https://arxiv.org/html/2605.06861#bib.bib67)\)and nonlinear estimatorsAdcocket al\.\([2023](https://arxiv.org/html/2605.06861#bib.bib60)\)\. In the former case, they are closely related to leverage scoresChenet al\.\([2016](https://arxiv.org/html/2605.06861#bib.bib68)\); Maet al\.\([2015](https://arxiv.org/html/2605.06861#bib.bib69)\)\. However, their use for Bayesian posterior sampling appears to be new\.

![Refer to caption](https://arxiv.org/html/2605.06861v1/x1.png)Figure 1:Ensemble Christoffel\-DPS\.Two DPS sampling trajectories on the Pinball dataset\.Row 1 \(random\):6 fixed sensors at random positions\.Row 2 \(Christoffel\-DPS\):3 fixed, anchor sensors \(∙\{\\color\[rgb\]\{0,1,1\}\\bullet\}\) located via offline Christoffel\-DPS, 3 mobile sensors \(∙\{\\color\[rgb\]\{0,1,0\}\\bullet\}\) chosen online by the ensemble\-greedy update\.Columns: ground\-truth; intermediate noisy statext∗x\_\{t^\{\*\}\}; row 1 Tweedie etimate ensemble standard deviationσ\(x^0\)\\sigma\(\\hat\{x\}\_\{0\}\), row 2 empirical Christoffel score driving mobile sensor drift; final reconstruction atT=0T=0\.
## 2Setup and background on Christoffel sampling

State reconstruction involves recovering an unknown signalf∗:Θ→ℝf^\{\*\}:\\Theta\\rightarrow\\mathbb\{R\}defined on a domainΘ\\Theta\(typically a subset ofℝd\\mathbb\{R\}^\{d\}\) from a sparse set of noisy sensor measurements

yi=f∗\(θi\)\+ni,i=1,…,m,y\_\{i\}=f^\{\*\}\(\\theta\_\{i\}\)\+n\_\{i\},\\quad i=1,\\ldots,m,whereni∼i\.i\.d\.𝒩\(0,σ2\)n\_\{i\}\\sim\_\{\\mathrm\{i\.i\.d\.\}\}\\mathcal\{N\}\(0,\\sigma^\{2\}\)\. OSP is the problem of choosing sensor locationsθ1,…,θn∈Θ\\theta\_\{1\},\\ldots,\\theta\_\{n\}\\in\\Thetaso as to maximize the accuracy of the recovered state, which we denote asf^\\hat\{f\}\. In this work, we focus onposterior samplingtechniques, especially DPS\. Let𝒫\\mathcal\{P\}be a prior signal distribution, typically, a generative prior\. Then posterior sampling involves drawingf^∼𝒫\(⋅\|y,θ\)\\hat\{f\}\\sim\\mathcal\{P\}\(\\cdot\|y,\\theta\), wherey=\(yi\)i=1my=\(y\_\{i\}\)^\{m\}\_\{i=1\}andθ=\(θi\)i=1m\\theta=\(\\theta\_\{i\}\)^\{m\}\_\{i=1\}\. Hence the OSP goal in posterior sampling is to maximize the fidelity off^\\hat\{f\}tof∗f^\{\*\}via the choice ofθ\\theta\.

Consider the domainΘ\\Thetaequipped with some measureρ\\rho\. The key object in our work is theChristoffel functionK\(𝒫\)K\(\\mathcal\{P\}\)of𝒫\\mathcal\{P\}, defined as

K\(𝒫\):θ∈Θ↦supf∈𝕊\(𝒫\)\|f\(θ\)\|2∈ℝ\.K\(\\mathcal\{P\}\):\\theta\\in\\Theta\\mapsto\\sup\_\{f\\in\\mathbb\{S\}\(\\mathcal\{P\}\)\}\|f\(\\theta\)\|^\{2\}\\in\\mathbb\{R\}\.Here𝕊\(𝒫\)\\mathbb\{S\}\(\\mathcal\{P\}\)is thesecant setofsupp\(𝒫\)\\mathrm\{supp\}\(\\mathcal\{P\}\)

𝕊\(𝒫\)=\{f1−f2‖f1−f2‖:f1≠f2∈supp\(𝒫\)\}\\mathbb\{S\}\(\\mathcal\{P\}\)=\\left\\\{\\frac\{f\_\{1\}\-f\_\{2\}\}\{\{\\\|f\_\{1\}\-f\_\{2\}\\\|\}\}:f\_\{1\}\\neq f\_\{2\}\\in\\mathrm\{supp\}\(\\mathcal\{P\}\)\\right\\\}\(2\.1\)and∥⋅∥\{\\left\\\|\\cdot\\right\\\|\}is theLρ2\(Θ\)L^\{2\}\_\{\\rho\}\(\\Theta\)\-norm\. This function is closely related to theidentifiabilityof signals from𝒫\\mathcal\{P\}from their sensor values\. IfK\(𝒫\)\(θ\)K\(\\mathcal\{P\}\)\(\\theta\)is large, it means the sensor reading at locationθ\\thetacan better identify potential signals from𝒫\\mathcal\{P\}, while whenK\(𝒫\)\(θ\)K\(\\mathcal\{P\}\)\(\\theta\)is small, the sensor reading provides little information\.Christoffel sampling, developed later in this work, uses this function to guide sensor placement by striving to maximize identifiability in the posterior sampling context\.

Note that in practice, the state reconstruction problem is often formulated over a finite grid\. Let\{ξi\}i=1N⊂Θ\\\{\\xi\_\{i\}\\\}^\{N\}\_\{i=1\}\\subset\\Thetabe a fixed grid of nodes and identify the vectorx∗∈ℝNx^\{\*\}\\in\\mathbb\{R\}^\{N\}withf∗f^\{\*\}viaxi∗=f∗\(ξi\)x^\{\*\}\_\{i\}=f^\{\*\}\(\\xi\_\{i\}\)\. In this case,K\(𝒫\)K\(\\mathcal\{P\}\)becomes a score function over\{1,…,N\}\\\{1,\\ldots,N\\\}withK\(𝒫\)\(j\)=supf∈𝕊\(𝒫\)\|f\(ξj\)\|2K\(\\mathcal\{P\}\)\(j\)=\\sup\_\{f\\in\\mathbb\{S\}\(\\mathcal\{P\}\)\}\|f\(\\xi\_\{j\}\)\|^\{2\}\. Here, we also adopt the notationS∈ℝm×NS\\in\\mathbb\{R\}^\{m\\times N\}for a row\-selection matrix comprisingmmrows of the identityINI\_\{N\}, which corresponds to the selected sensors, and express the measurements as

yS=Sx∗\+n,n∼𝒩\(0,σ2Im\)\.y\_\{S\}=Sx^\{\*\}\+n,\\quad n\\sim\\mathcal\{N\}\(0,\\sigma^\{2\}I\_\{m\}\)\.\(2\.2\)

## 3Theoretical setup and results

We now describe a general framework for theoretically\-OSP\. We base this onAdcock and Huang \([2025](https://arxiv.org/html/2605.06861#bib.bib59)\); Jalalet al\.\([2021](https://arxiv.org/html/2605.06861#bib.bib56)\), which introduced a framework for Bayesian recovery, but restricted to linear measurements between finite\-dimensional vector spaces, and in the latter case, Gaussian measurements \(which are not applicable to state reconstruction\)\. A key generalization we develop here is the extension from problems formulated inℝN\\mathbb\{R\}^\{N\}to arbitrary Hilbert spaces\. Further, for technical reasons we elaborate in §[A](https://arxiv.org/html/2605.06861#A1), the framework is these prior works is not suitable to OSP, since it assumes independence between the measurements and the noise\. We relax this assumption, leading to the novel theoretically\-OSP Christoffel sampling strategy discussed later\. Finally, we note that our setup is broader than the state reconstruction problem defined in §[2](https://arxiv.org/html/2605.06861#S2), in that it allows for sensing with arbitrary linear functionals, rather than just pointwise evaluations of the state\.

### 3\.1Setup

Let𝕏\\mathbb\{X\}be a Hilbert space and𝕏0⊆𝕏\\mathbb\{X\}\_\{0\}\\subseteq\\mathbb\{X\}be a normed vector subspace, termed theobject space\. Letℛ,𝒫\\mathcal\{R\},\\mathcal\{P\}be two probability distributions on𝕏\\mathbb\{X\}that take values in𝕏0\\mathbb\{X\}\_\{0\}almost surely, termed therealandapproximatedistributions, respectively\. Our goal is to recover an unknownf∗∼ℛf^\{\*\}\\sim\\mathcal\{R\}from its measurements by sampling from the posterior based on𝒫\\mathcal\{P\}\. Note that, typically,𝒫\\mathcal\{P\}is a generative prior that has been trained on a dataset drawn fromℛ\\mathcal\{R\}\. However, this is not a requirement of our theory\.

We now define our sensing model\. Let\(Θ,𝒯,ρ\)\(\\Theta,\\mathcal\{T\},\\rho\)be a measure space\. We termΘ\\Thetathesensor parametrization set: in brief,θ0∈Θ\\theta\_\{0\}\\in\\Thetaparametrizes a sensor, telling it what measurement to take\. Next, we define asensing operatoras an injective map

L:Θ→ℬ\(𝕏0\),L:\\Theta\\rightarrow\\mathcal\{B\}\(\\mathbb\{X\}\_\{0\}\),whereℬ\(𝕏0\)\\mathcal\{B\}\(\\mathbb\{X\}\_\{0\}\)is the set of bounded, linear functionals𝕏0→ℝ\\mathbb\{X\}\_\{0\}\\rightarrow\\mathbb\{R\}\. Now letμ\\mube such that\(Θ,𝒯,μ\)\(\\Theta,\\mathcal\{T\},\\mu\)forms a probability space\. We termμ\\muthesampling measure\. With this in hand, letf∗∈𝕏0f^\{\*\}\\in\\mathbb\{X\}\_\{0\}be the object to recover\. To measuref∗f^\{\*\}, we drawθ1,…,θm∼i\.i\.d\.μ\\theta\_\{1\},\\ldots,\\theta\_\{m\}\\sim\_\{\\mathrm\{i\.i\.d\.\}\}\\muand consider noisy measurements

y=\(yi\)i=1m∈ℝm,whereyi=L\(θi\)\(f∗\)\+ni,i=1,…,m,y=\(y\_\{i\}\)^\{m\}\_\{i=1\}\\in\\mathbb\{R\}^\{m\},\\quad\\text\{where \}y\_\{i\}=L\(\\theta\_\{i\}\)\(f^\{\*\}\)\+n\_\{i\},\\ i=1,\\ldots,m,andni∼𝒩\(0,σ2\)n\_\{i\}\\sim\\mathcal\{N\}\(0,\\sigma^\{2\}\)is measurement noise that is independent of theθi\\theta\_\{i\}\. For convenience, we write

θ=\(θi\)i=1m,n=\(ni\)i=1m,L\(θ\)\(f∗\)=\(L\(θi\)\(f∗\)\)i=1m,y=L\(θ\)\(f∗\)\+n\.\\theta=\(\\theta\_\{i\}\)^\{m\}\_\{i=1\},\\quad n=\(n\_\{i\}\)^\{m\}\_\{i=1\},\\quad L\(\\theta\)\(f^\{\*\}\)=\(L\(\\theta\_\{i\}\)\(f^\{\*\}\)\)^\{m\}\_\{i=1\},\\quad y=L\(\\theta\)\(f^\{\*\}\)\+n\.\(3\.1\)Finally, we assume thatLLisnondegenerate with respect to𝒫\\mathcal\{P\}\. Namely, there exist constants0<α≤β<∞0<\\alpha\\leq\\beta<\\inftysuch that

α‖f‖2≤∫Θ\|L\(θ\)\(f\)\|2dρ\(θ\)≤β‖f‖2,∀f∈𝕊\(𝒫\),\\alpha\{\\\|f\\\|\}^\{2\}\\leq\\int\_\{\\Theta\}\|L\(\\theta\)\(f\)\|^\{2\}\\,\\mathrm\{d\}\\rho\(\\theta\)\\leq\\beta\{\\\|f\\\|\}^\{2\},\\quad\\forall f\\in\\mathbb\{S\}\(\\mathcal\{P\}\),\(3\.2\)where𝕊\(𝒫\)\\mathbb\{S\}\(\\mathcal\{P\}\)is the secant set \([2\.1](https://arxiv.org/html/2605.06861#S2.E1)\)\. Here and elsewhere∥⋅∥\{\\\|\\cdot\\\|\}is the norm on𝕏\\mathbb\{X\}\. Note that this assumption is mild, and essentially states that the energy of any signalffshould be approximately preserved, on average, by the sensing operator\.

In this paper, we primarily consider state reconstruction from point samples\. This problem can be formulated in the above framework by letting𝕏=Lρ2\(Θ\)\\mathbb\{X\}=L^\{2\}\_\{\\rho\}\(\\Theta\)be the space ofL2L^\{2\}functions on some domainΘ\\Theta,𝕏0=C\(Θ¯\)\\mathbb\{X\}\_\{0\}=C\(\\overline\{\\Theta\}\)andL:Θ→ℬ\(𝕏0\)L:\\Theta\\to\\mathcal\{B\}\(\\mathbb\{X\}\_\{0\}\)withL\(θ\)\(f\)=f\(θ\)L\(\\theta\)\(f\)=f\(\\theta\),∀f∈𝕏0\\forall f\\in\\mathbb\{X\}\_\{0\}\. Notice thatLLsatisfies \([3\.2](https://arxiv.org/html/2605.06861#S3.E2)\) withα=β=1\\alpha=\\beta=1\. However, while not the focus of this work, we note in passing that our framework can easily handle other types of recovery problems\. One straightforward modification is the gradient\-augmented problem, whereL\(θ\)\(f\)=\(f\(θ\),∇f\(θ\)\)⊤L\(\\theta\)\(f\)=\(f\(\\theta\),\\nabla f\(\\theta\)\)^\{\\top\}\. This type of sampling arises in various scientific and engineering applications\. Another application is to inverse PDE problems, where the functionffis some state corresponding to a PDE \(e\.g\., an initial condition or inhomogeneity\) andL\(θ\)\(f\)=u\(θ\)L\(\\theta\)\(f\)=u\(\\theta\)evaluates the PDE solutionuucorresponding toffatθ\\theta\. Thus, our framework can be readily applied to \(linear\) inverse PDE problems\.

### 3\.2Main theoretical result

With this in hand, we now consider how to chooseμ\\muoptimally so as to minimize the number of measurements/sensorsmmrequired to recover an unknownf∗∼ℛf^\{\*\}\\sim\\mathcal\{R\}accurately by sampling from the approximate posteriorf^∼𝒫\(⋅\|y,θ\)\\hat\{f\}\\sim\\mathcal\{P\}\(\\cdot\|y,\\theta\), whereθ=\(θi\)i=1m\\theta=\(\\theta\_\{i\}\)^\{m\}\_\{i=1\}\. Specifically, givent\>0t\>0, we shall estimatep=ℙ\[‖f∗−f^‖≥t\]p=\\mathbb\{P\}\[\{\\\|f^\{\*\}\-\\hat\{f\}\\\|\}\\geq t\], where the probability is taken with respect to all variables, i\.e\.,f∗∼ℛf^\{\*\}\\sim\\mathcal\{R\},θ∼μ⊗m\\theta\\sim\\mu^\{\\otimes m\},e∼𝒩\(0,σ2Im\)e\\sim\\mathcal\{N\}\(0,\\sigma^\{2\}I\_\{m\}\)andf^∼𝒫\(⋅\|y,θ\)\\hat\{f\}\\sim\\mathcal\{P\}\(\\cdot\|y,\\theta\), wherey=L\(θ\)\(f∗\)\+ey=L\(\\theta\)\(f^\{\*\}\)\+e\. To state our main result, we first require the following\.

###### Definition 3\.2\(Approximate covering number\)\.

Let\(X,ℱ,𝒫\)\(X,\\mathcal\{F\},\\mathcal\{P\}\)be a probability space andδ,η≥0\\delta,\\eta\\geq 0\. Theη,δ\\eta,\\delta\-approximate covering number of𝒫\\mathcal\{P\}is defined as

Covη,δ\(𝒫\)=min⁡\{k∈ℕ:∃\{xi\}i=1k⊆supp\(𝒫\),𝒫\(⋃i=1kBη\(xi\)\)≥1−δ\}\.\\mathrm\{Cov\}\_\{\\eta,\\delta\}\(\\mathcal\{P\}\)=\\min\\left\\\{k\\in\\mathbb\{N\}:\\exists\\\{x\_\{i\}\\\}^\{k\}\_\{i=1\}\\subseteq\\mathrm\{supp\}\(\\mathcal\{P\}\),\\ \\mathcal\{P\}\\left\(\\bigcup\_\{i=1\}^\{k\}B\_\{\\eta\}\(x\_\{i\}\)\\right\)\\geq 1\-\\delta\\right\\\}\.

Approximate covering numbers quantify the minimum number of balls of radiusη\\eta, to cover1−δ1\-\\deltaof the mass of𝒫\\mathcal\{P\}\. At a high level, this characterizes the complexity of the prior𝒫\\mathcal\{P\}and therefore plays a key role in the number of measurements needed for recovery\. For further discussion and examples of covering numbers for typical distributions, seeAdcock and Huang \([2025](https://arxiv.org/html/2605.06861#bib.bib59)\); Jalalet al\.\([2021](https://arxiv.org/html/2605.06861#bib.bib56)\)\.

###### Theorem 3\.3\.

Let1≤p≤∞1\\leq p\\leq\\infty,0<δ≤1/40<\\delta\\leq 1/4,ε,η\>0\\varepsilon,\\eta\>0and consider the above setup\. Suppose thatμ≪ρ\\mu\\ll\\rho,dμ/dρ\>0\\,\\mathrm\{d\}\\mu/\\,\\mathrm\{d\}\\rho\>0μ\\mu\-a\.s\., letw\(θ\)=\(dμ\(θ\)/dρ\)−1w\(\\theta\)=\\left\(\\,\\mathrm\{d\}\\mu\(\\theta\)/\\,\\mathrm\{d\}\\rho\\right\)^\{\-1\}and define

wmin=essinfθ∼ρw\(θ\),wmax=esssupθ∼ρw\(θ\)\.w\_\{\\min\}=\\mathrm\{essinf\}\_\{\\theta\\sim\\rho\}w\(\\theta\),\\quad w\_\{\\max\}=\\mathrm\{esssup\}\_\{\\theta\\sim\\rho\}w\(\\theta\)\.Suppose that

Wp\(ℛ,𝒫\)≤εandσ≥mεwminδ1/p\+1/2\.W\_\{p\}\(\\mathcal\{R\},\\mathcal\{P\}\)\\leq\\varepsilon\\quad\\text\{and\}\\quad\\sigma\\geq\\frac\{m\\varepsilon\}\{w\_\{\\min\}\\delta^\{1/p\+1/2\}\}\.\(3\.3\)Then

ℙ\[‖f∗−f^‖≥\(32max⁡\{wmax,1/α,β\}\+2\)\(η\+σ\)\]≲δ,\\mathbb\{P\}\[\{\\\|f^\{\*\}\-\\hat\{f\}\\\|\}\\geq\(32\\max\\\{w\_\{\\max\},1/\\alpha,\\beta\\\}\+2\)\(\\eta\+\\sigma\)\]\\lesssim\\delta,provided

m≥c\(α,β\)⋅\(κw\(𝒫,L\)\+1\)⋅\(log⁡Covη,δ\(𝒫\)\+wmax\+log⁡\(1/δ\)\),m\\geq c\(\\alpha,\\beta\)\\cdot\\left\(\\kappa\_\{w\}\(\\mathcal\{P\},L\)\+1\\right\)\\cdot\\left\(\\log\\mathrm\{Cov\}\_\{\\eta,\\delta\}\(\\mathcal\{P\}\)\+\\sqrt\{w\_\{\\max\}\}\+\\log\(1/\\delta\)\\right\),\(3\.4\)wherec\(α,β\)\>0c\(\\alpha,\\beta\)\>0depends onα,β\\alpha,\\betaonly and

κw\(𝒫,L\)=esssupθ∼ρ⁡\{w\(θ\)supf∈𝕊\(𝒫\)\|L\(θ\)\(f\)\|2\}\.\\kappa\_\{w\}\(\\mathcal\{P\},L\)=\\operatorname\*\{ess\\,sup\}\_\{\\theta\\sim\\rho\}\\Big\\\{w\(\\theta\)\\sup\_\{f\\in\\mathbb\{S\}\(\\mathcal\{P\}\)\}\|L\(\\theta\)\(f\)\|^\{2\}\\Big\\\}\.\(3\.5\)

This theorem provides anonasymptoticguarantee for successful Bayesian recovery with high probability witharbitrarypriors𝒫\\mathcal\{P\}\. Further, it addresses the realistic setting where the true signalf∗f^\{\*\}is drawn from some unknown real distribution which is close to the \(typically learned\) distribution𝒫\\mathcal\{P\}that is used as the prior\. In particular, Theorem[3\.3](https://arxiv.org/html/2605.06861#S3.Thmtheorem3)states the error‖f∗−f^‖\{\\\|f^\{\*\}\-\\hat\{f\}\\\|\}is within a constant ofη\+σ\\eta\+\\sigma, whereη\\etais an accuracy parameter andσ\\sigmais the noise level, with probability at least1−δ1\-\\delta, provided the number of sensors satisfies \([3\.4](https://arxiv.org/html/2605.06861#S3.E4)\)\. This latter estimates states that the sample complexity is dictated bylog⁡Covη,δ\(𝒫\)\\log\\mathrm\{Cov\}\_\{\\eta,\\delta\}\(\\mathcal\{P\}\)multiplied byκw\(𝒫,L\)\\kappa\_\{w\}\(\\mathcal\{P\},L\)\. The former is a measure of the complexity of𝒫\\mathcal\{P\}: more complex priors require more measurements for successful recovery\. The latter dictates how the prior interacts with the sensing operatorLLand the sampling distributionμ\\mu\(see the termw\(θ\)w\(\\theta\)\)\.

Theorem[3\.3](https://arxiv.org/html/2605.06861#S3.Thmtheorem3)holds for almost arbitrary sampling measuresμ\\mu\. In order to identify a theoretically\-optimal choiceμ⋆\\mu^\{\\star\}, we choosewwthat minimizes the right\-hand side of \([3\.4](https://arxiv.org/html/2605.06861#S3.E4)\)\. Assuming measurability and recalling thatwwmust satisfy∫w\(θ\)−1dρ\(θ\)=1\\int w\(\\theta\)^\{\-1\}\\,\\mathrm\{d\}\\rho\(\\theta\)=1sinceμ\\muis a probability measure, the optimal choice is given by

w\(θ\)−1=K\(𝒫,L\)\(θ\)C\(𝒫,L\)w\(\\theta\)^\{\-1\}=\\frac\{K\(\\mathcal\{P\},L\)\(\\theta\)\}\{C\(\\mathcal\{P\},L\)\}whereK\(𝒫,L\)K\(\\mathcal\{P\},L\)is the\(generalized\) Christoffel functionAdcocket al\.\([2023](https://arxiv.org/html/2605.06861#bib.bib60)\)

K\(𝒫,L\)=supf∈𝕊\(𝒫\)\|L\(θ\)\(f\)\|2,C\(𝒫,L\)=∫ΘK\(𝒫,L\)\(θ\)dρ\(θ\)\.K\(\\mathcal\{P\},L\)=\\sup\_\{\\begin\{subarray\}\{c\}f\\in\\mathbb\{S\}\(\\mathcal\{P\}\)\\end\{subarray\}\}\|L\(\\theta\)\(f\)\|^\{2\},\\qquad C\(\\mathcal\{P\},L\)=\\int\_\{\\Theta\}K\(\\mathcal\{P\},L\)\(\\theta\)\\,\\mathrm\{d\}\\rho\(\\theta\)\.This choice ofwwminimizes \([3\.5](https://arxiv.org/html/2605.06861#S3.E5)\), yielding the minimal valueκw\(𝒫,L\)=C\(𝒫,L\)\\kappa\_\{w\}\(\\mathcal\{P\},L\)=C\(\\mathcal\{P\},L\)\. However, it is also convenient to choosew\(θ\)w\(\\theta\)that is not too large, which will allow one to remove the dependence onwmaxw\_\{\\max\}in the above bounds\. Fortunately, this can be easily done, by choosing

w\(θ\)−1=K\(𝒫,L\)\(θ\)2C\(P,L\)\+12\.w\(\\theta\)^\{\-1\}=\\frac\{K\(\\mathcal\{P\},L\)\(\\theta\)\}\{2C\(P,L\)\}\+\\frac\{1\}\{2\}\.In this case, one haswmax≤2w\_\{\\max\}\\leq 2, whileκw\(𝒫,L\)≤2C\(𝒫,L\)\\kappa\_\{w\}\(\\mathcal\{P\},L\)\\leq 2C\(\\mathcal\{P\},L\)is within a factor of22of being optimal\. Using this, we now chooseμ⋆\\mu^\{\\star\}as follows:

dμ⋆\(θ\)=\(K\(𝒫,L\)\(θ\)2C\(P,L\)\+12\)dρ\(θ\)\.\\,\\mathrm\{d\}\\mu^\{\\star\}\(\\theta\)=\\left\(\\frac\{K\(\\mathcal\{P\},L\)\(\\theta\)\}\{2C\(P,L\)\}\+\\frac\{1\}\{2\}\\right\)\\,\\mathrm\{d\}\\rho\(\\theta\)\.\(3\.6\)We summarize this discussion in the following result\.

###### Corollary 3\.4\(Theoretically\-optimal sampling\)\.

Assume the setup of the previous theorem, withμ=μ⋆\\mu=\\mu^\{\\star\}given by \([3\.6](https://arxiv.org/html/2605.06861#S3.E6)\)\. Then

ℙ\[‖f∗−f^‖≥\(32max⁡\{2,1/α,β\}\+2\)\(η\+σ\)\]≲δ,\\mathbb\{P\}\[\{\\\|f^\{\*\}\-\\hat\{f\}\\\|\}\\geq\(32\\max\\\{2,1/\\alpha,\\beta\\\}\+2\)\(\\eta\+\\sigma\)\]\\lesssim\\delta,provided

m≥c\(α,β\)⋅\(C\(𝒫,L\)\+1\)⋅\(log⁡Covη,δ\(𝒫\)\+log⁡\(1/δ\)\)\.m\\geq c\(\\alpha,\\beta\)\\cdot\\left\(C\(\\mathcal\{P\},L\)\+1\\right\)\\cdot\\left\(\\log\\mathrm\{Cov\}\_\{\\eta,\\delta\}\(\\mathcal\{P\}\)\+\\log\(1/\\delta\)\\right\)\.\(3\.7\)

This result establishes a theoretically\-optimal \(random\) sensor placement strategy at a very high level of generality, in that it can be applied to arbitrary distributions𝒫\\mathcal\{P\}and sampling operatorsLL\. We term the choiceμ=μ⋆\\mu=\\mu^\{\\star\}asChristoffel sampling\.

In general, it is difficult to find a explicit expression or upper bound for the constantC\(𝒫,L\)C\(\\mathcal\{P\},L\), as it depends critically on𝒫\\mathcal\{P\}andLL\. An exception is the case where𝒫\\mathcal\{P\}is supported in certain intrinsically low\-dimensional manifolds\. For example, if𝒫\\mathcal\{P\}is supported in akk\-dimensional subspace of𝕏0\\mathbb\{X\}\_\{0\}, then it is a short argument to show thatC\(𝒫,L\)≤βkC\(\\mathcal\{P\},L\)\\leq\\beta k\. Indeed, let\{ϕi\}i=1k\\\{\\phi\_\{i\}\\\}^\{k\}\_\{i=1\}be an orthonormal basis for such as subspace\. ThenK\(𝒫,L\)≤∑i=1k\|L\(θ\)\(ϕi\)\|2K\(\\mathcal\{P\},L\)\\leq\\sum^\{k\}\_\{i=1\}\|L\(\\theta\)\(\\phi\_\{i\}\)\|^\{2\}and, due to \([3\.2](https://arxiv.org/html/2605.06861#S3.E2)\), it follows thatC\(𝒫,L\)≤βkC\(\\mathcal\{P\},L\)\\leq\\beta k\. More generally, if𝒫\\mathcal\{P\}is supported in a union ofddsubspaces of dimension at mostnn, then one hasC\(𝒫,L\)≤βkdC\(\\mathcal\{P\},L\)\\leq\\beta kd\.

## 4Christoffel\-DPS

We now describe our algorithms for practical implementation of Christoffel sampling in the setting of DPS, which we term Christoffel\-DPS, with both online and offline variants\. Throughout, we work in the discrete setting \(see §[2](https://arxiv.org/html/2605.06861#S2)\) and we considerLLas the pointwise evaluation operator, i\.e\.,L\(j\)\(x\)=xjL\(j\)\(x\)=x\_\{j\}forx∈ℝNx\\in\\mathbb\{R\}^\{N\}\. Notably, computingK\(𝒫,L\)K\(\\mathcal\{P\},L\)for a generative prior involves a supremum over𝕊\(𝒫\)\\mathbb\{S\}\(\\mathcal\{P\}\), which is typically intractable\. We propose two practical instantiations to handle this\.

- •Offline\(§[4\.2](https://arxiv.org/html/2605.06861#S4.SS2)\): estimateKKfrom a finite secant set drawn once, before sampling\.
- •Online\(§[4\.3](https://arxiv.org/html/2605.06861#S4.SS3)\): recomputeKKon a live DPS ensemble at scheduleddrift events, allowing for adaptive sensor placements based on measurements obtained during prior drift events\.

### 4\.1Diffusion posterior sampling \(DPS\)

We first briefly review DPS\(Chunget al\.,[2023](https://arxiv.org/html/2605.06861#bib.bib70)\)and introduce notation we will use later\. For convenience we present the construction in finite dimensions and with sensor measurements given by \([2\.2](https://arxiv.org/html/2605.06861#S2.E2)\); the extension to function\-space diffusion models follows\(Pidstrigachet al\.,[2024](https://arxiv.org/html/2605.06861#bib.bib71)\)via a trace\-class covariance operatorCCreplacing the identity\. Let𝒫\\mathcal\{P\}be a prior induced by a diffusion model on𝕏=ℝN\\mathbb\{X\}=\\mathbb\{R\}^\{N\}and letDθ\(x,σ\)≈𝔼\[x0∣xt=x\]D\_\{\\theta\}\(x,\\sigma\)\\approx\\mathbb\{E\}\[x\_\{0\}\\mid x\_\{t\}=x\]denote the corresponding denoiser trained by score matching on samples, with associated scores\(t,x\)=−σ−2\(t\)\(x−Dθ\(x,σ\(t\)\)\)s\(t,x\)=\-\\sigma^\{\-2\}\(t\)\\bigl\(x\-D\_\{\\theta\}\(x,\\sigma\(t\)\)\\bigr\)\. The variance\-exploding forward SDE\(Karraset al\.,[2022](https://arxiv.org/html/2605.06861#bib.bib26)\)is

dxt=gtCdWt,x0∼𝒫,gt=dσ2/dt,dx\_\{t\}=g\_\{t\}\\sqrt\{C\}\\,dW\_\{t\},\\qquad x\_\{0\}\\sim\\mathcal\{P\},\\qquad g\_\{t\}=\\sqrt\{d\\sigma^\{2\}/dt\},with transition law𝒩\(x0,σ2\(t\)C\)\\mathcal\{N\}\(x\_\{0\},\\sigma^\{2\}\(t\)C\), and the reverse SDEdzt=gT−t2s\(T−t,zt\)dt\+gT−tCdWtdz\_\{t\}=g\_\{T\-t\}^\{2\}\\,s\(T\-t,z\_\{t\}\)\\,dt\+g\_\{T\-t\}\\sqrt\{C\}\\,dW\_\{t\}starting fromz0∼𝒩\(0,σ2\(T\)C\)z\_\{0\}\\sim\\mathcal\{N\}\(0,\\sigma^\{2\}\(T\)C\)yieldszT∼𝒫z\_\{T\}\\sim\\mathcal\{P\}\. DPS samples from the posterior givenyS=Sx∗\+ηy\_\{S\}=Sx^\{\*\}\+\\etaby augmenting the unconditional score with a Tweedie\-approximated measurement gradient,

∇log⁡hy\(t,x\)≈−∇xΦ\(Dθ\(x,σ\(t\)\),yS\),Φ\(x0,y\)=12ση2‖Sx0−y‖2\.\\nabla\\log h^\{y\}\(t,x\)\\;\\approx\\;\-\\nabla\_\{x\}\\,\\Phi\\\!\\bigl\(D\_\{\\theta\}\(x,\\sigma\(t\)\),\\,y\_\{S\}\\bigr\),\\qquad\\Phi\(x\_\{0\},y\)=\\tfrac\{1\}\{2\\sigma\_\{\\eta\}^\{2\}\}\{\\\|Sx\_\{0\}\-y\\\|\}^\{2\}\.In theσ\(t\)→0\\sigma\(t\)\\\!\\to\\\!0limit this reduces to the Kalman update with gain set by the local denoiser Jacobian – in particular, no assumption on𝒫\\mathcal\{P\}beyond access toDθD\_\{\\theta\}is required\.

### 4\.2Offline Christoffel\-DPS

We first describe two offline Christoffel\-DPS variants\. In both cases, we commence with a set of snapshots\{x\(n\)\}n=1M⊂supp\(𝒫\)\\\{x^\{\(n\)\}\\\}\_\{n=1\}^\{M\}\\subset\\mathrm\{supp\}\(\\mathcal\{P\}\), either training data or unconditional samples drawn once from𝒫\\mathcal\{P\}\.

Christoffel\-DPS\.This approach is a near\-direct application of the theory in §[3](https://arxiv.org/html/2605.06861#S3)\. We define the*empirical secant set*, the finite\-MManalogue of the secant set𝕊\(𝒫\)\\mathbb\{S\}\(\\mathcal\{P\}\)from §[3](https://arxiv.org/html/2605.06861#S3), as

𝕊M=\{x\(n\)−x\(n′\)‖x\(n\)−x\(n′\)‖:n≠n′\}⊂ℝN\\mathbb\{S\}\_\{M\}=\\left\\\{\\frac\{x^\{\(n\)\}\-x^\{\(n^\{\\prime\}\)\}\}\{\{\\\|x^\{\(n\)\}\-x^\{\(n^\{\\prime\}\)\}\\\|\}\}:n\\neq n^\{\\prime\}\\right\\\}\\subset\\mathbb\{R\}^\{N\}and then compute the empirical Christoffel functionK^M\\widehat\{K\}\_\{M\}by replacing the supremum over𝕊\(𝒫\)\\mathbb\{S\}\(\\mathcal\{P\}\)with the maximum over𝕊M\\mathbb\{S\}\_\{M\}, giving

K^M\(j\)=maxx∈𝕊M⁡\|xj\(n\)\|2,j∈\{1,…,N\}\.\\widehat\{K\}\_\{M\}\(j\)=\\max\_\{x\\in\\mathbb\{S\}\_\{M\}\}\|x^\{\(n\)\}\_\{j\}\|^\{2\},\\quad j\\in\\\{1,\\ldots,N\\\}\.\(4\.1\)Note thatK^M→K\\widehat\{K\}\_\{M\}\\to KasM→∞M\\rightarrow\\infty\. With \([4\.1](https://arxiv.org/html/2605.06861#S4.E1)\) in hand, Christoffel\-DPS proceeds by selectingmmsensors i\.i\.d\. according to the discrete probability distribution on\{1,…,N\}\\\{1,\\ldots,N\\\}with weights \([4\.1](https://arxiv.org/html/2605.06861#S4.E1)\)\.

Greedy Christoffel\-DPS\.The theoretical analysis in §[3](https://arxiv.org/html/2605.06861#S3)demonstrates the fundamental nature of the secant set𝕊M\\mathbb\{S\}\_\{M\}\. In this variant, we implement an alternative to i\.i\.d\. sampling that builds an OSP in a greedy manner\. This is reminiscent of classical OED design criteria, but differs starkly, as it works directly on the empirical secant set𝕊M\\mathbb\{S\}\_\{M\}corresponding to a highly nonlinear prior𝒫\\mathcal\{P\}\.

LetX∈ℝN×MX\\in\\mathbb\{R\}^\{N\\times M\}be the matrix of mean\-adjusted snapsnotsx\(n\)−x\(avg\)x^\{\(n\)\}\-x^\{\(\\mathrm\{avg\}\)\}, wherex\(avg\)=1M∑n=1Mx\(n\)x^\{\(\\mathrm\{avg\}\)\}=\\frac\{1\}\{M\}\\sum^\{M\}\_\{n=1\}x^\{\(n\)\}\. LetS∈ℝm×NS\\in\\mathbb\{R\}^\{m\\times N\}be row\-selection matrix and notice that

SX\(SX\)⊤=1N2∑n,n′,n′′\(S\(x\(n\)−x\(n′\)\)\)\(S\(x\(n\)−x\(n′′\)\)\)⊤SX\(SX\)^\{\\top\}=\\frac\{1\}\{N^\{2\}\}\\sum\_\{n,n^\{\\prime\},n^\{\\prime\\prime\}\}\\left\(S\\left\(x^\{\(n\)\}\-x^\{\(n^\{\\prime\}\)\}\\right\)\\right\)\\left\(S\\left\(x^\{\(n\)\}\-x^\{\(n^\{\\prime\\prime\}\)\}\\right\)\\right\)^\{\\top\}\(4\.2\)Using ideas fromKarniket al\.\([2025](https://arxiv.org/html/2605.06861#bib.bib25)\); Manoharet al\.\([2018](https://arxiv.org/html/2605.06861#bib.bib35)\), we now constructSSin a greedy manner to maximize the information gain in this matrix\. In practice, this is done by column\-pivoted QR factorization; see App\.[C\.1](https://arxiv.org/html/2605.06861#A3.SS1)\.

Note that approach does not directly work with the empirical secant set𝕊M\\mathbb\{S\}\_\{M\}, due to computational resources \(\|𝕊M\|=M\(M−1\)\|\\mathbb\{S\}\_\{M\}\|=M\(M\-1\)in general\)\. Instead, it uses the empirical averagexavgx^\{\\mathrm\{avg\}\}to effect a cheap approximation to pairwise differences, as seen in \([4\.2](https://arxiv.org/html/2605.06861#S4.E2)\), retaining the original dataset sizeMM\. Normalization is also ignored, as the averaged snapshotsx\(n\)−x\(avg\)x^\{\(n\)\}\-x^\{\(\\mathrm\{avg\}\)\}have similar norms in practice\.

A standard approach to Bayesian\-OED for state estimation involves first computing a POD \(proper orthogonal decomposition\)\. Here one computes an orthonormal basis\{v\(1\),…,v\(r\)\}⊂ℝN\\\{v^\{\(1\)\},\\ldots,v^\{\(r\)\}\\\}\\subset\\mathbb\{R\}^\{N\}from the set of snapshots\{x\(n\)\}n=1M\\\{x^\{\(n\)\}\\\}^\{M\}\_\{n=1\}, then forms the design matrixV∈ℝN×rV\\in\\mathbb\{R\}^\{N\\times r\}from this basis and applies an OED criterion \(e\.g\., A\- or D\-optimality, or some heuristic\) onVV, potentially with a data\-driven regularization, as inManoharet al\.\([2018](https://arxiv.org/html/2605.06861#bib.bib35)\); Chaturantabut and Sorensen \([2010](https://arxiv.org/html/2605.06861#bib.bib10)\); Drmač and Gugercin \([2016](https://arxiv.org/html/2605.06861#bib.bib15)\); Karniket al\.\([2025](https://arxiv.org/html/2605.06861#bib.bib25)\); Klishinet al\.\([2025](https://arxiv.org/html/2605.06861#bib.bib27)\)\. However, this differs significantly from greedy Christoffel\-DPS, as the POD basis is, in effect, approximating the support of𝒫\\mathcal\{P\}by a linearrr\-dimensional subspace\. Thus it may fail to capture the geometry of𝒫\\mathcal\{P\}when using complex, highly non\-Gaussian priors\. By contrast, greedy empirical Christoffel works on the secant set, which better captures the geometry of𝒫\\mathcal\{P\}\.

### 4\.3Online ensemble Christoffel\-DPS

A benefit of generative recovery is that the sensor placement need not be fixed in advance in an offline fashion, but rather updated online by using the fact that the Christoffel function can be repeatedly re\-estimated during the diffusion process conditioned on the current measurements\. To achieve this, we run an ensemble ofNeN\_\{e\}DPS chains\{zt\(i\)\}i=1Ne\\\{z^\{\(i\)\}\_\{t\}\\\}\_\{i=1\}^\{N\_\{e\}\}sharing the current row\-selector matrixSSand measurement vectorySy\_\{S\}, and scheduleDD*drift events*at noise levelsσ\(t1\)\>⋯\>σ\(tD\)\\sigma\(t\_\{1\}\)\>\\cdots\>\\sigma\(t\_\{D\}\)along the reverse diffusion\. At drift eventtdt\_\{d\}each chain emits its Tweedie point estimatex^0\(i\)=Dθ\(ztd\(i\),σ\(td\)\)\\hat\{x\}^\{\(i\)\}\_\{0\}=D\_\{\\theta\}\(z^\{\(i\)\}\_\{t\_\{d\}\},\\sigma\(t\_\{d\}\)\), and the empirical Christoffel function is evaluated on the secant set of the resulting ensemble:

K^td\(j\)=maxi≠i′⁡\|x^0,j\(i\)−x^0,j\(i′\)\|2‖x^0\(i\)−x^0\(i′\)‖2,j=1,…,N\.\\widehat\{K\}\_\{t\_\{d\}\}\(j\)=\\max\_\{i\\neq i^\{\\prime\}\}\\frac\{\\bigl\|\\hat\{x\}^\{\(i\)\}\_\{0,j\}\-\\hat\{x\}^\{\(i^\{\\prime\}\)\}\_\{0,j\}\\bigr\|^\{2\}\}\{\{\\\|\\hat\{x\}^\{\(i\)\}\_\{0\}\-\\hat\{x\}^\{\(i^\{\\prime\}\)\}\_\{0\}\\\|\}^\{2\}\},\\qquad j=1,\\ldots,N\.\(4\.3\)We then use this to relocate the sensors by sampling i\.i\.d\. with probabilities proportional toK^td\(j\)\\widehat\{K\}\_\{t\_\{d\}\}\(j\)\. The new entries ofySy\_\{S\}are then read, chains whose measurement residual exceeds a quantile threshold are pruned, and the survivors resume under the new conditional score until the next drift event\. After theDDth event the ensemble is collapsed either by selecting the chain with the smallest measurement residual or by averaging over survivors\. We detail this algorithm in App\.[C\.2](https://arxiv.org/html/2605.06861#A3.SS2)\.

## 5Experimental Results

We evaluate Christoffel\-DPS on three DPS benchmarks: the fluidic pinball \(GRIFDIRRowbottomet al\.\([2026](https://arxiv.org/html/2605.06861#bib.bib41)\)\), the Darcy forward problem \(DiffusionPDE\(Huanget al\.,[2024](https://arxiv.org/html/2605.06861#bib.bib22)\), and Kolmogorov flow\(Amorós\-Trepatet al\.,[2026](https://arxiv.org/html/2605.06861#bib.bib5)\)\. For each, we take a trained denoiser, and run 10 random seeds of DPS guidance with the reverse\-diffusion sampler and generate the row\-selectorSSonce offline using the following placement strategies: random; A\-, D\- and E\-optimal POD\(Joshi and Boyd,[2009](https://arxiv.org/html/2605.06861#bib.bib23)\), SSPOR using PySensors 2\.0Karniket al\.\([2025](https://arxiv.org/html/2605.06861#bib.bib25)\); the offline Christoffel\-DPS variants of §[4\.2](https://arxiv.org/html/2605.06861#S4.SS2)and the online ensemble Christoffel\-DPS method of §[4\.3](https://arxiv.org/html/2605.06861#S4.SS3)\. Each \(strategy, number of sensorsmm\) cell is averaged over the random seeds and reported as mean±\\pmstd of the relativeL2L^\{2\}error‖x^−x∗‖2/‖x∗‖2\{\\\|\\hat\{x\}\-x^\{\*\}\\\|\}\_\{2\}/\{\\\|x^\{\*\}\\\|\}\_\{2\}\.

### 5\.1Pinball problem with GRIFDIR

![Refer to caption](https://arxiv.org/html/2605.06861v1/x2.png)Figure 2:Pinball problem: m\-convergence:Relative\-L2L\_\{2\}error vs sensor budgetmm\.Our first experiment is on the Pinball problemTomasettoet al\.\([2025](https://arxiv.org/html/2605.06861#bib.bib72)\)\. We instantiate Christoffel\-DPS in the infinite\-dimensional setting via GRIFDIR\(Rowbottomet al\.,[2026](https://arxiv.org/html/2605.06861#bib.bib41)\), a function\-space EDM diffusion model on an unstructured domain with continuous FEM\-continuous graph kernel layers and a multiscale graph and latent space transformer backbone\. The forward noise is a Gaussian random field with covariance operatorCC, and the sameCCbroadcasts the DPS measurement\-gradient at inference: with likelihood potentialΦ\(x0,yS\)=12ση2‖Sx0−yS‖2\\Phi\(x\_\{0\},y\_\{S\}\)=\\tfrac\{1\}\{2\\sigma\_\{\\eta\}^\{2\}\}\{\\\|Sx\_\{0\}\-y\_\{S\}\\\|\}^\{2\}, the conditional\-score correction \(cf\. §[4\.1](https://arxiv.org/html/2605.06861#S4.SS1)\) is approximated by∇log⁡hy\(t,x\)≈−∇xΦ\(Dθ\(x,σ\(t\)\),yS\)\\nabla\\log h^\{y\}\(t,x\)\\approx\-\\nabla\_\{x\}\\Phi\\bigl\(D\_\{\\theta\}\(x,\\sigma\(t\)\),y\_\{S\}\\bigr\)\.

The denoiser is trained on snapshots of the scalar fieldccgoverned by the advection–diffusion equation∂tc\+v\(μ;x\)⋅∇c−DΔc=0\\partial\_\{t\}c\+v\(\\mu;x\)\\\!\\cdot\\\!\\nabla c\-D\\Delta c=0, wherev\(μ;x\)v\(\\mu;x\)is the steady RANS solution around three rotating pinball cylinders with three rotation ratesμ\\muselected uniformly from\[−5,5\]\[\-5,5\]\. Snapshots live on the unstructured FEM mesh of the simulation, soSSis implemented to select mesh nodes\. Figure[2](https://arxiv.org/html/2605.06861#S5.F2)shows that the online ensemble Christoffel\-DPS reaches the error of the random and classical baselines \(A\-, D\-, E\-optimal POD, SSPOR\) at roughly half the sensor budget, and the offline Christoffel\-DPS variants match or exceed the strongest classical baseline across all budgets\. Full experimental details are given in App\.[B\.1](https://arxiv.org/html/2605.06861#A2.SS1)\.

### 5\.2Darcy flow with DiffusionPDE

Our second experiment is full state reconstruction on Darcy flowHuanget al\.\([2024](https://arxiv.org/html/2605.06861#bib.bib22)\); Liet al\.\([2021](https://arxiv.org/html/2605.06861#bib.bib73)\)\. We use the joint\(a,u\)\(a,u\)DhariwalUNet\(Karraset al\.,[2022](https://arxiv.org/html/2605.06861#bib.bib26)\)trained on the steady Darcy equation−∇⋅\(a∇u\)=f\-\\nabla\\cdot\(a\\nabla u\)=fon a uniform128×128128\\times 128grid with binary conductivitya∈\{3,12\}a\\in\\\{3,12\\\}and pressureuu\. Recovery uses the Karras\-EDM DPS sampler: at every reverse Heun step the measurement gradient∇xt‖yS−Sx^0‖2\\nabla\_\{x\_\{t\}\}\{\\\|y\_\{S\}\-S\\hat\{x\}\_\{0\}\\\|\}^\{2\}is taken via auto\-grad through the denoiser and added to the unconditional score\.

With128×128=16,384128\\times 128=16,384pixels and 500 Heun steps as our default hyper\-parameters this data was comparatively expensive compared to pinball which has roughly 7,500 mesh nodes and used 50 Heun denoising steps\. For practical reasons we reduced the ensemble size fromNe=20N\_\{e\}=20to 10 and reverse\-diffusion budget from20002000to500500Heun steps\. This may explain the slight under performance of ensemble Christoffel in this experiment compared to Pinball and Kolmogorov flow, which we discuss in the following section\.

![Refer to caption](https://arxiv.org/html/2605.06861v1/x3.png)Figure 3:Darcy flow: m\-convergence:Relative\-L2L\_\{2\}error vs sensor budgetmm\.Finally OPS methods that require a POD basis are implemented with independenta,ua,uplacement strategies and the D\-/E\-optimal methods collapse due to rank\-deficiency of the low\-frequency fielduuat smallkk\(Fig\.[3](https://arxiv.org/html/2605.06861#S5.F3)\); we additionally report a Tikhonov\-regularized variants \(D,E\-opt\-reg\) with priorΣ0=diag\(λ\)\+ϵI\\Sigma\_\{0\}=\\mathrm\{diag\}\(\\lambda\)\+\\epsilon I,ϵ=10−4\\epsilon=10^\{\-4\}\. Full experimental details are given in App\.[B\.2](https://arxiv.org/html/2605.06861#A2.SS2)\.

### 5\.3Kolmogorov flow with physics\-constrained masked diffusion

For our third benchmark we use the U\-Net DDPM\-style denoiser ofAmorós\-Trepatet al\.\([2026](https://arxiv.org/html/2605.06861#bib.bib5)\), trained on vorticity snapshots of the 2D Kolmogorov flow\. The model uses an novel ConFIG gradient\-projection step to form consensus on score and guidance gradients and rather than backpropagating a measurement gradient at every reverse step, recovery uses a latent\-mask\-blending DPS sampler, the denoiser’s prediction at observed indices is overwritten with the RBF\-broadcast measurements\. The reverse loop runs entirely intorch\.no\_grad, so the per\-step cost is dominated by a single forward denoiser pass per chain and the ensemble overhead is modest; the online ensemble Christoffel\-DPS therefore runs at the fullNe=20N\_\{e\}=20chains and the unmodified reverse\-diffusion budget \(in contrast to the slimmed Darcy configuration of §[5\.2](https://arxiv.org/html/2605.06861#S5.SS2)\)\. Experimental results and full experimental details are given in App\.[B\.3](https://arxiv.org/html/2605.06861#A2.SS3)\.

## 6Discussion and limitations

We presented a framework for theoretically\-OSP based on Christoffel functions in the setting of posterior sampling\. Using this theory, we introduced Christoffel\-DPS, a novel algorithm for OSP for sparse sensor reconstruction tasks using DPS\. We achieved this by empirically estimating the Christoffel function, in both online and offline fashions, and exploiting it in both random and greedy sensor placement strategies\. Our methods differ from classical OED methods, as our sensor placement strategies are specifically designed non\-linear, non\-Gaussian priors such as those arising in DPS\. On datasets such as Pinball and Kolmogorov flow, Christoffel\-DPS, particularly greedy Christoffel\-DPS, reach classical baselines at approximately half the sensor budget compared to other methods\. Even in the Darcy flow experiment, greedy Christoffel\-DPS reaches a lower baseline and outperforms classical methods on low sensor budgets\. Non\-greedy Christoffel\-DPS tends not to perform as well, which we believe is due to the expensive and difficult nature of estimating the empirical secant set\.

We finish off by discussing limitations and avenues for future work\. As a first point, while our theory allows for arbitrary types of linear measurements, we have only considered state reconstruction from point evaluations in our experiments\. Future work on, for instance, gradient augmented\-dataAdcocket al\.\([2023](https://arxiv.org/html/2605.06861#bib.bib60)\)have yet to be explored\. Outside of state estimation, inverse PDE problems are another important class of problems not addressed in this work\. While our theory can be applied directly to linear PDEs, our algorithms have not yet been applied to such problems\. For both theory and algorithms, however, nonlinear PDEs remain a more substantial challenge\. Next, it is notable that while Christoffel\-DPS outperforms classical OED methods in many cases, their benefits can be mild on datasets/problems which are approximately Gaussian and linear\. We intend to explore other highly non\-linear, non\-Gaussian datasets in future works\. It will also be possible to apply our approach to conditional generative models asMaet al\.\([2025](https://arxiv.org/html/2605.06861#bib.bib34)\)\. On the algorithmic side, our methods require approximation of the secant set, which can be expensive\. We mitigated this issue with greedy Christoffel\-DPS, but it remains to be seen if there are more efficient ways to estimate this set\. Finally, our online algorithm is expensive, requiring ensembles with DPS\. In particular the guidance term of DPS requires automatic differentiation at every reverse step, which expensive at inference\. We remark that this problem is intrinsic to expensive DPS strategies, in experiment 3 we found a solution to this using masked diffusion\. Overall, the landscape of generative modelling is rapidly evolving, the state estimate techniques explored in this paper are well poised to benefit from any algorithmic or technological advances from the wider field\. Our theory provides a principled and complementary partner for future OED research that is flexible to any new posterior sampling methods\.

## Acknowledgements

We thank Alexander Denker for helpful discussion and feedback\. JR acknowledges support from the EU Horizon MSCA\-SE under project REMODEL, “Research Exchanges in the Mathematics of Deep Learning with Applications” \(grant agreement no\. 101131557\)\. CBS acknowledges support from the Royal Society Wolfson Fellowship, the EPSRC advanced career fellowship EP/V029428/1, the EPSRC programme grant EP/V026259/1, the Wellcome Innovator Awards 215733/Z/19/Z and 221633/Z/20/Z, the EPSRC funded ProbAI hub EP/Y028783/1, the European Union Horizon 2020 research and innovation programme under the Marie Skodowska\-Curie grant agreement REMODEL\. BA acknowledges support from the Natural Sciences and Engineering Research Council of Canada \(NSERC\) through grant RGPIN/2470\-2021 and FRQ \(Fonds de recherche du Quebec\) – Nature et Technologies through grant 359708\.

## References

- B\. Adcock, J\. M\. Cardenas, and N\. Dexter \(2023\)CS4ML: A general framework for active learning with arbitrary data based on Christoffel functions\.InAdvances in Neural Information Processing Systems,A\. Oh, T\. Naumann, A\. Globerson, K\. Saenko, M\. Hardt, and S\. Levine \(Eds\.\),Vol\.36,pp\. 19990–20037\.Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p6.1),[§3\.2](https://arxiv.org/html/2605.06861#S3.SS2.p4.7),[§6](https://arxiv.org/html/2605.06861#S6.p2.1)\.
- B\. Adcock and Z\. Y\. \(\. Huang \(2025\)How many measurements are enough? bayesian recovery in inverse problems with general distributions\.InAdvances in Neural Information Processing Systems,D\. Belgrave, C\. Zhang, H\. Lin, R\. Pascanu, P\. Koniusz, M\. Ghassemi, and N\. Chen \(Eds\.\),Vol\.38,pp\. 97285–97326\.Cited by:[§A\.2](https://arxiv.org/html/2605.06861#A1.SS2.p1.15),[§A\.3](https://arxiv.org/html/2605.06861#A1.SS3.1.p1.2),[§A\.3](https://arxiv.org/html/2605.06861#A1.SS3.2.p2.4),[§A\.3](https://arxiv.org/html/2605.06861#A1.SS3.3.p1.11),[§A\.3](https://arxiv.org/html/2605.06861#A1.SS3.3.p1.29),[§A\.3](https://arxiv.org/html/2605.06861#A1.SS3.4.p1.6),[§A\.3](https://arxiv.org/html/2605.06861#A1.SS3.p1.4),[§A\.3](https://arxiv.org/html/2605.06861#A1.SS3.p2.1),[§A\.3](https://arxiv.org/html/2605.06861#A1.SS3.p3.3),[§A\.3](https://arxiv.org/html/2605.06861#A1.SS3.p4.1),[§A\.4](https://arxiv.org/html/2605.06861#A1.SS4.3.p1.5),[§A\.4](https://arxiv.org/html/2605.06861#A1.SS4.p4.3),[Appendix A](https://arxiv.org/html/2605.06861#A1.p1.2),[§3\.2](https://arxiv.org/html/2605.06861#S3.SS2.p2.4),[§3](https://arxiv.org/html/2605.06861#S3.p1.1)\.
- M\. Amorós\-Trepat, L\. Medrano\-Navarro, Q\. Liu, L\. Guastoni, and N\. Thuerey \(2026\)Guiding diffusion models to reconstruct flow fields from sparse data\.Physics of Fluids38\(1\),pp\. 015112\.External Links:ISSN 1070\-6631,[Document](https://dx.doi.org/10.1063/5.0304492)Cited by:[§B\.3](https://arxiv.org/html/2605.06861#A2.SS3.p1.4),[§1](https://arxiv.org/html/2605.06861#S1.p3.1),[§5\.3](https://arxiv.org/html/2605.06861#S5.SS3.p1.1),[§5](https://arxiv.org/html/2605.06861#S5.p1.5)\.
- L\. Baldassari, J\. Garnier, K\. Solna, and M\. V\. de Hoop \(2026\)Preconditioned langevin dynamics with score\-based generative models for infinite\-dimensional linear bayesian inverse problems\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems,External Links:[Link](https://openreview.net/forum?id=rVyBrD8h2b)Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p4.1)\.
- P\. Businger and G\. H\. Golub \(1965\)Linear least squares solutions by householder transformations\.Numerische Mathematik7\(3\),pp\. 269–276\.External Links:ISSN 0945\-3245,[Document](https://dx.doi.org/10.1007/BF01436084)Cited by:[§C\.1](https://arxiv.org/html/2605.06861#A3.SS1.p1.3)\.
- D\. Chakraborty, H\. Kim, and R\. Maulik \(2026\)Adaptive Diffusion Posterior Sampling for Data and Model Fusion of Complex Nonlinear Dynamical Systems\.arXiv\.External Links:2603\.12635,[Document](https://dx.doi.org/10.48550/arXiv.2603.12635)Cited by:[§C\.2](https://arxiv.org/html/2605.06861#A3.SS2.p4.3),[§1](https://arxiv.org/html/2605.06861#S1.p4.1)\.
- S\. Chaturantabut and D\. C\. Sorensen \(2010\)Nonlinear Model Reduction via Discrete Empirical Interpolation\.SIAM Journal on Scientific Computing32\(5\),pp\. 2737–2764\.External Links:ISSN 1064\-8275, 1095\-7197,[Document](https://dx.doi.org/10.1137/090766498)Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p2.1),[§4\.2](https://arxiv.org/html/2605.06861#S4.SS2.p7.8)\.
- P\. Chen, Y\. Sun, L\. Cheng, Y\. Yang, W\. Li, Y\. Liu, W\. Liu, J\. Bian, and S\. Fang \(2025\)Generating Full\-field Evolution of Physical Dynamics from Irregular Sparse Observations\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems,Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p3.1)\.
- S\. Chen, R\. Varma, A\. Singh, and J\. Kovačcević \(2016\)A statistical perspective of sampling scores for linear regression\.In2016 IEEE International Symposium on Information Theory \(ISIT\),pp\. 1556–1560\.Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p6.1)\.
- H\. Chung, J\. Kim, M\. T\. McCann, M\. L\. Klasky, and J\. C\. Ye \(2023\)Diffusion posterior sampling for general noisy inverse problems\.InInternational Conference on Learning Representations,Cited by:[§C\.2](https://arxiv.org/html/2605.06861#A3.SS2.p2.4),[§C\.2](https://arxiv.org/html/2605.06861#A3.SS2.p3.13),[§1](https://arxiv.org/html/2605.06861#S1.p4.1),[§4\.1](https://arxiv.org/html/2605.06861#S4.SS1.p1.5)\.
- A\. Cohen and G\. Migliorati \(2017\)Optimal weighted least\-squares methods\.SMAI J\. Comput\. Math\.3,pp\. 181–203\.Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p6.1)\.
- H\. V\. Dang and P\. C\. H\. Nguyen \(2025\)Deep Operator Learning for High\-Fidelity Fluid Flow Field Reconstruction From Sparse Sensor Measurements\.Journal of Computing and Information Science in Engineering26\(011007\)\.External Links:ISSN 1530\-9827,[Document](https://dx.doi.org/10.1115/1.4070332)Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p3.1)\.
- Z\. Deng, C\. He, and Y\. Liu \(2021\)Deep neural network\-based strategy for optimal sensor placement in data assimilation of turbulent flow\.Physics of Fluids33\(2\),pp\. 025119\.External Links:ISSN 1070\-6631,[Document](https://dx.doi.org/10.1063/5.0035230)Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p4.1)\.
- Z\. Drmač and S\. Gugercin \(2016\)A New Selection Operator for the Discrete Empirical Interpolation Method—Improved A Priori Error Bound and Extensions\.SIAM Journal on Scientific Computing38\(2\),pp\. A631–A648\.External Links:ISSN 1064\-8275,[Document](https://dx.doi.org/10.1137/15M1019271)Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p2.1),[§4\.2](https://arxiv.org/html/2605.06861#S4.SS2.p7.8)\.
- G\. Duthé, I\. Abdallah, and E\. Chatzi \(2025\)Graph Transformers for inverse physics: reconstructing flows around arbitrary 2D airfoils\.External Links:[Document](https://dx.doi.org/10.48550/ARXIV.2501.17081)Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p3.1)\.
- R\. Everson and L\. Sirovich \(1995\)Karhunen–Loève procedure for gappy data\.JOSA A12\(8\),pp\. 1657–1664\.External Links:ISSN 1520\-8532,[Document](https://dx.doi.org/10.1364/JOSAA.12.001657)Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p2.1)\.
- K\. Fukami, R\. Maulik, N\. Ramachandra, K\. Fukagata, and K\. Taira \(2021\)Global field reconstruction from sparse sensors with Voronoi tessellation\-assisted deep learning\.Nature Machine Intelligence3\(11\),pp\. 945–951\.External Links:ISSN 2522\-5839,[Document](https://dx.doi.org/10.1038/s42256-021-00402-2)Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p3.1)\.
- J\. Huang, G\. Yang, Z\. Wang, and J\. J\. Park \(2024\)DiffusionPDE: Generative PDE\-Solving under Partial Observation\.InThe Thirty\-eighth Annual Conference on Neural Information Processing Systems,Cited by:[§B\.2](https://arxiv.org/html/2605.06861#A2.SS2.p1.6),[§B\.2](https://arxiv.org/html/2605.06861#A2.SS2.p1.7),[§1](https://arxiv.org/html/2605.06861#S1.p3.1),[§5\.2](https://arxiv.org/html/2605.06861#S5.SS2.p1.6),[§5](https://arxiv.org/html/2605.06861#S5.p1.5)\.
- A\. Jalal, S\. Karmalkar, A\. Dimakis, and E\. Price \(2021\)Instance\-optimal compressed sensing via posterior sampling\.In38th International Conference on Machine Learning,pp\. 4709–4720\.External Links:[Document](https://dx.doi.org/PMLR)Cited by:[Appendix A](https://arxiv.org/html/2605.06861#A1.p1.2),[§3\.2](https://arxiv.org/html/2605.06861#S3.SS2.p2.4),[§3](https://arxiv.org/html/2605.06861#S3.p1.1)\.
- S\. Joshi and S\. Boyd \(2009\)Sensor Selection via Convex Optimization\.IEEE Transactions on Signal Processing57\(2\),pp\. 451–462\.External Links:ISSN 1941\-0476,[Document](https://dx.doi.org/10.1109/TSP.2008.2007095)Cited by:[§5](https://arxiv.org/html/2605.06861#S5.p1.5)\.
- R\. Karczewski, M\. Heinonen, and V\. Garg \(2024\)Diffusion Models as Cartoonists: The Curious Case of High Density Regions\.InThe Thirteenth International Conference on Learning Representations,Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p4.1)\.
- N\. Karnik, Y\. Bhangale, M\. G\. Abdo, A\. A\. Klishin, J\. J\. Cogliati, B\. W\. Brunton, J\. N\. Kutz, S\. L\. Brunton, and K\. Manohar \(2025\)PySensors 2\.0: A Python Package for Sparse Sensor Placement\.arXiv\.External Links:2509\.08017,[Document](https://dx.doi.org/10.48550/arXiv.2509.08017)Cited by:[§4\.2](https://arxiv.org/html/2605.06861#S4.SS2.p4.5),[§4\.2](https://arxiv.org/html/2605.06861#S4.SS2.p7.8),[§5](https://arxiv.org/html/2605.06861#S5.p1.5)\.
- T\. Karras, M\. Aittala, T\. Aila, and S\. Laine \(2022\)Elucidating the Design Space of Diffusion\-Based Generative Models\.InAdvances in Neural Information Processing Systems,Cited by:[§C\.2](https://arxiv.org/html/2605.06861#A3.SS2.p2.4),[§1](https://arxiv.org/html/2605.06861#S1.p4.1),[§4\.1](https://arxiv.org/html/2605.06861#S4.SS1.p1.5),[§5\.2](https://arxiv.org/html/2605.06861#S5.SS2.p1.6)\.
- A\. A\. Klishin, J\. N\. Kutz, and K\. Manohar \(2025\)Data\-Induced Interactions of Sparse Sensors Using Statistical Physics\.arXiv\.External Links:2307\.11838,[Document](https://dx.doi.org/10.48550/arXiv.2307.11838)Cited by:[§4\.2](https://arxiv.org/html/2605.06861#S4.SS2.p7.8)\.
- A\. Krause, A\. Singh, and C\. Guestrin \(2008\)Near\-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies\.Journal of Machine Learning Research9\(8\),pp\. 235–284\.External Links:ISSN 1533\-7928Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p2.1)\.
- Z\. Li, N\. Kovachki, K\. Azizzadenesheli, B\. Liu, K\. Bhattacharya, A\. Stuart, and A\. Anandkumar \(2021\)Fourier neural operator for parametric partial differential equations\.InICLR,Cited by:[§5\.2](https://arxiv.org/html/2605.06861#S5.SS2.p1.6)\.
- J\. H\. Lim, N\. B\. Kovachki, R\. Baptista, C\. Beckham, K\. Azizzadenesheli, J\. Kossaifi, V\. Voleti, J\. Song, K\. Kreis, J\. Kautz, C\. Pal, A\. Vahdat, and A\. Anandkumar \(2023\)Score\-based Diffusion Models in Function Space\.Note:https://arxiv\.org/abs/2302\.07400v3Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p3.1)\.
- T\. Y\. L\. Lin, J\. Yao, L\. Chiang, J\. Berner, and A\. Anandkumar \(2026\)Decoupled diffusion sampling for inverse problems on function spaces\.arXiv:2601\.23280\.Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p3.1)\.
- P\. Ma, M\. W\. Mahoney, and B\. Yu \(2015\)A statistical perspective on algorithmic leveraging\.J\. Mach\. Learn\. Res\.16,pp\. 861–911\.Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p6.1)\.
- Y\. Ma, H\. Wu, H\. Zhou, H\. Weng, J\. Wang, and M\. Long \(2025\)PhySense: Sensor Placement Optimization for Accurate Physics Sensing\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems,Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p3.1),[§1](https://arxiv.org/html/2605.06861#S1.p4.1),[§6](https://arxiv.org/html/2605.06861#S6.p2.1)\.
- K\. Manohar, B\. W\. Brunton, J\. N\. Kutz, and S\. L\. Brunton \(2018\)Data\-Driven Sparse Sensor Placement for Reconstruction: Demonstrating the Benefits of Exploiting Known Patterns\.IEEE Control Systems Magazine38\(3\),pp\. 63–86\.External Links:ISSN 1941\-000X,[Document](https://dx.doi.org/10.1109/MCS.2018.2810460)Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p2.1),[§4\.2](https://arxiv.org/html/2605.06861#S4.SS2.p4.5),[§4\.2](https://arxiv.org/html/2605.06861#S4.SS2.p7.8)\.
- A\. Marcato, E\. Guiltinan, H\. Viswanathan, D\. O’Malley, N\. Lubbers, and J\. E\. Santos \(2024\)Journey over destination: dynamic sensor placement enhances generalization\.Machine Learning: Science and Technology5\(2\),pp\. 025070\.External Links:ISSN 2632\-2153,[Document](https://dx.doi.org/10.1088/2632-2153/ad4e06)Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p4.1)\.
- J\. Pidstrigach, Y\. Marzouk, S\. Reich, and S\. Wang \(2024\)Infinite\-dimensional diffusion models\.J\. Mach\. Learn\. Res\.25\(414\),pp\. 1–52\.Cited by:[§4\.1](https://arxiv.org/html/2605.06861#S4.SS1.p1.5)\.
- J\. Rowbottom, E\. L\. Baker, N\. Huang, B\. Adcock, C\. Schönlieb, and A\. Denker \(2026\)GRIFDIR: Graph Resolution\-Invariant FEM Diffusion Models in Function Spaces over Irregular Domains\.arXiv\.External Links:2605\.03497,[Document](https://dx.doi.org/10.48550/arXiv.2605.03497)Cited by:[§B\.1](https://arxiv.org/html/2605.06861#A2.SS1.p1.12),[§5\.1](https://arxiv.org/html/2605.06861#S5.SS1.p1.4),[§5](https://arxiv.org/html/2605.06861#S5.p1.5)\.
- J\. E\. Santos, Z\. R\. Fox, A\. Mohan, D\. O’Malley, H\. Viswanathan, and N\. Lubbers \(2023\)Development of the Senseiver for efficient field reconstruction from sparse observations\.Nature Machine Intelligence5\(11\),pp\. 1317–1325\.External Links:ISSN 2522\-5839,[Document](https://dx.doi.org/10.1038/s42256-023-00746-x)Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p3.1)\.
- M\. Tomasetto, J\. P\. Williams, F\. Braghin, A\. Manzoni, and J\. N\. Kutz \(2025\)Reduced order modeling with shallow recurrent decoder networks\.Nature Communications16\(1\),pp\. 10260\.Cited by:[§B\.1](https://arxiv.org/html/2605.06861#A2.SS1.p1.12),[§5\.1](https://arxiv.org/html/2605.06861#S5.SS1.p1.4)\.
- K\. Willcox \(2006\)Unsteady flow sensing and estimation via the gappy proper orthogonal decomposition\.Computers & Fluids35\(2\),pp\. 208–226\.External Links:ISSN 0045\-7930,[Document](https://dx.doi.org/10.1016/j.compfluid.2004.11.006)Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p2.1)\.
- Z\. Xu, S\. Wang, X\. Zhang, and G\. He \(2024\)Optimal sensor placement for ensemble\-based data assimilation using gradient\-weighted class activation mapping\.Journal of Computational Physics514,pp\. 113224\.External Links:ISSN 0021\-9991,[Document](https://dx.doi.org/10.1016/j.jcp.2024.113224)Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p4.1)\.
- Q\. Zhang, D\. Krotov, and G\. E\. Karniadakis \(2025\)Operator learning for reconstructing flow fields from sparse measurements: An energy transformer approach\.Journal of Computational Physics538,pp\. 114148\.External Links:ISSN 0021\-9991,[Document](https://dx.doi.org/10.1016/j.jcp.2025.114148)Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p3.1)\.
- S\. Zhao, F\. Wang, Y\. Tang, and Y\. Liu \(2025\)Optimal Sensor Placement Based on Attention Mechanism for Minimizing Lift Fluctuations Over an Airfoil with Deep Reinforcement Learning\.InProceedings of the 7th China Aeronautical Science and Technology Conference,Singapore,pp\. 379–393\.External Links:[Document](https://dx.doi.org/10.1007/978-981-97-9771-4%5F30),ISBN 978\-981\-97\-9771\-4Cited by:[§1](https://arxiv.org/html/2605.06861#S1.p4.1)\.

## Appendix AProofs

We now prove the main results of the paper\. We do this by first presenting a abstract theoretical framework for accuracy and stability when solving Bayesian inverse problems in general Hilbert spaces\. Theorem[3\.3](https://arxiv.org/html/2605.06861#S3.Thmtheorem3)then follows as a corollary of this abstract framework\. As mentioned, our approach is based primarily onAdcock and Huang \[[2025](https://arxiv.org/html/2605.06861#bib.bib59)\], Jalalet al\.\[[2021](https://arxiv.org/html/2605.06861#bib.bib56)\], but with substantial generalizations to, firstly, handle infinite\-dimensional Hilbert spaces and, secondly, to provide a more flexible framework from which we can derive the theoretically\-OSP strategy, i\.e\., Christoffel sampling\. Specifically, these works considered only discrete problems inℝN\\mathbb\{R\}^\{N\}, which we now generalize to problems in arbitrary separable Hilbert spaces𝕏\\mathbb\{X\}\. Further, the abstract framework inAdcock and Huang \[[2025](https://arxiv.org/html/2605.06861#bib.bib59)\], Jalalet al\.\[[2021](https://arxiv.org/html/2605.06861#bib.bib56)\]assumed independence between the measurement distributions and the noise distribution\. For reasons we describe in Remark[A\.4](https://arxiv.org/html/2605.06861#A1.SS4), this assumption is violated in the OSP setting of Theorem[3\.3](https://arxiv.org/html/2605.06861#S3.Thmtheorem3)\. In this work, we generalize these prior works by allowing the sensing operator and noise to be dependent\.

### A\.1Setup and objective

Let𝕏\\mathbb\{X\}be a Hilbert space\. Notice that this automatically implies that𝕏\\mathbb\{X\}is a Polish space, i\.e\., a separable completely metrizable Hilbert space\. Letℛ,𝒫\\mathcal\{R\},\\mathcal\{P\}be Borel probability distributions on𝕏\\mathbb\{X\}\. Letℒ\(𝕏0,ℝm\)\\mathcal\{L\}\(\\mathbb\{X\}\_\{0\},\\mathbb\{R\}^\{m\}\)denote the space of continuous linear operators𝕏→ℝm\\mathbb\{X\}\\rightarrow\\mathbb\{R\}^\{m\}\. Now letℳ\\mathcal\{M\}be a probability distribution onℒ\(𝕏,ℝm\)×ℝm\\mathcal\{L\}\(\\mathbb\{X\},\\mathbb\{R\}^\{m\}\)\\times\\mathbb\{R\}^\{m\}with marginals𝒜\\mathcal\{A\}andℰ\\mathcal\{E\}\. Now let\(A,e\)∼ℳ\(A,e\)\\sim\\mathcal\{M\}andf∗∼ℛf^\{\*\}\\sim\\mathcal\{R\}independently\. Given measurementsy=A\(f∗\)\+ey=A\(f^\{\*\}\)\+e, our aim is to find conditions onℳ,𝒫,ℛ\\mathcal\{M\},\\mathcal\{P\},\\mathcal\{R\}such that, for smallt\>0t\>0,

p:=ℙf∗∼ℛ,\(A,e\)∼ℳ,f^∼𝒫\(⋅\|y,A\)\{‖f∗−f^‖≥t\}p:=\\mathbb\{P\}\_\{f^\{\*\}\\sim\\mathcal\{R\},\(A,e\)\\sim\\mathcal\{M\},\\hat\{f\}\\sim\\mathcal\{P\}\(\\cdot\|y,A\)\}\\\{\{\\\|f^\{\*\}\-\\hat\{f\}\\\|\}\\geq t\\\}\(A\.1\)is small\. When understood, we use∥⋅∥\{\\\|\\cdot\\\|\}to denote either the Hilbert space norm on𝕏\\mathbb\{X\}or the Euclidean norm onℝm\\mathbb\{R\}^\{m\}\.

### A\.2Key definitions

We require several definitions\. Let𝒜\\mathcal\{A\}be as above andD⊆𝕏D\\subseteq\\mathbb\{X\}\. Then alower concentration bound for𝒜\\mathcal\{A\}is any constantC𝗅𝗈𝗐\(t\)=C𝗅𝗈𝗐\(t;𝒜,D\)≥0C\_\{\\mathsf\{low\}\}\(t\)=C\_\{\\mathsf\{low\}\}\(t;\\mathcal\{A\},D\)\\geq 0such that

ℙA∼𝒜\{‖A\(f\)‖≤t‖f‖\}≤C𝗅𝗈𝗐\(t;𝒜,D\),∀f∈D\.\\mathbb\{P\}\_\{A\\sim\\mathcal\{A\}\}\\\{\{\\\|A\(f\)\\\|\}\\leq t\{\\\|f\\\|\}\\\}\\leq C\_\{\\mathsf\{low\}\}\(t;\\mathcal\{A\},D\),\\quad\\forall f\\in D\.Similarly, anupper concentration bound for𝒜\\mathcal\{A\}is any constantC𝗎𝗉𝗉\(t\)=C𝗎𝗉𝗉\(t;𝒜,D\)≥0C\_\{\\mathsf\{upp\}\}\(t\)=C\_\{\\mathsf\{upp\}\}\(t;\\mathcal\{A\},D\)\\geq 0such that

ℙA∼𝒜\{‖A\(f\)‖≥t‖f‖\}≤C𝗎𝗉𝗉\(t;𝒜,D\),∀f∈D\.\\mathbb\{P\}\_\{A\\sim\\mathcal\{A\}\}\\\{\{\\\|A\(f\)\\\|\}\\geq t\{\\\|f\\\|\}\\\}\\leq C\_\{\\mathsf\{upp\}\}\(t;\\mathcal\{A\},D\),\\quad\\forall f\\in D\.Next, givent,s≥0t,s\\geq 0an\(upper\) absolute concentration bound for𝒜\\mathcal\{A\}is any constantC𝖺𝖻𝗌\(s,t;𝒜,D\)C\_\{\\mathsf\{abs\}\}\(s,t;\\mathcal\{A\},D\)such that

ℙA∼𝒜\(‖A\(f\)‖\>t\)≤C𝖺𝖻𝗌\(s,t;𝒜,D\),∀f∈D,‖f‖≤s\.\\mathbb\{P\}\_\{A\\sim\\mathcal\{A\}\}\(\{\\\|A\(f\)\\\|\}\>t\)\\leq C\_\{\\mathsf\{abs\}\}\(s,t;\\mathcal\{A\},D\),\\quad\\forall f\\in D,\\ \{\\\|f\\\|\}\\leq s\.Similarly,\(upper\) concentration bound forℰ\\mathcal\{E\}is any constantD𝗎𝗉𝗉\(t\)=D𝗎𝗉𝗉\(t;ℰ\)≥0D\_\{\\mathsf\{upp\}\}\(t\)=D\_\{\\mathsf\{upp\}\}\(t;\\mathcal\{E\}\)\\geq 0such that

ℰ\(Btc\)=ℙe∼ℰ\(‖e‖≥t\)≤D𝗎𝗉𝗉\(t;ℰ\)\.\\mathcal\{E\}\(B^\{c\}\_\{t\}\)=\\mathbb\{P\}\_\{e\\sim\\mathcal\{E\}\}\(\{\\\|e\\\|\}\\geq t\)\\leq D\_\{\\mathsf\{upp\}\}\(t;\\mathcal\{E\}\)\.The four previous quantities are similar to those inAdcock and Huang \[[2025](https://arxiv.org/html/2605.06861#bib.bib59)\]\(the main difference is the extension to infinite\-dimensional Hilbert spaces\)\. We also introduce the following concept, which differs from that inAdcock and Huang \[[2025](https://arxiv.org/html/2605.06861#bib.bib59)\], and is better suited to the infinite\-dimensional setting\. Givenτ,ε≥0\\tau,\\varepsilon\\geq 0, adensity shift boundforℳ\\mathcal\{M\}is any constantD𝗌𝗁𝗂𝖿𝗍\(ε,τ\)=D𝗌𝗁𝗂𝖿𝗍\(ε,τ;ℳ\)≥0D\_\{\\mathsf\{shift\}\}\(\\varepsilon,\\tau\)=D\_\{\\mathsf\{shift\}\}\(\\varepsilon,\\tau;\\mathcal\{M\}\)\\geq 0\(possibly\+∞\+\\infty\) such that

dℰAdTv♯ℰA\(e\)≤D𝗌𝗁𝗂𝖿𝗍\(ε,τ;ℳ\),∀e,v∈ℝm,‖e‖≤τ,‖v‖≤ε,a\.s\.A∼𝒜\.\\frac\{\\,\\mathrm\{d\}\\mathcal\{E\}\_\{A\}\}\{\\,\\mathrm\{d\}T\_\{v\}\\sharp\\mathcal\{E\}\_\{A\}\}\(e\)\\leq D\_\{\\mathsf\{shift\}\}\(\\varepsilon,\\tau;\\mathcal\{M\}\),\\quad\\forall e,v\\in\\mathbb\{R\}^\{m\},\\ \{\\\|e\\\|\}\\leq\\tau,\\ \{\\\|v\\\|\}\\leq\\varepsilon,\\quad\\text\{a\.s\.\}\\ A\\sim\\mathcal\{A\}\.HereℰA\\mathcal\{E\}\_\{A\}is the conditional distribution ofee, givenAA, andTv:𝕏→𝕏T\_\{v\}:\\mathbb\{X\}\\rightarrow\\mathbb\{X\},u↦u\+vu\\mapsto u\+vis the translation map\. Note we setD𝗌𝗁𝗂𝖿𝗍\(ε,τ\)=\+∞D\_\{\\mathsf\{shift\}\}\(\\varepsilon,\\tau\)=\+\\inftyifℰA\\mathcal\{E\}\_\{A\}is not absolutely continuous with respect to the pushforwardTv♯ℰAT\_\{v\}\\sharp\\mathcal\{E\}\_\{A\}for somevvandAA\. A probability measure for whichℰA\\mathcal\{E\}\_\{A\}is equivalent toTv♯ℰAT\_\{v\}\\sharp\\mathcal\{E\}\_\{A\}for allv∈ℝmv\\in\\mathbb\{R\}^\{m\}is calledquasi\-invariant\. HenceD𝗌𝗁𝗂𝖿𝗍\(ε\)<∞D\_\{\\mathsf\{shift\}\}\(\\varepsilon\)<\\inftywheneverℰA\\mathcal\{E\}\_\{A\}is quasi\-invariantandthe Radon\-Nikodym derivativedℰA/dTv♯ℰA\\,\\mathrm\{d\}\\mathcal\{E\}\_\{A\}/\\,\\mathrm\{d\}T\_\{v\}\\sharp\\mathcal\{E\}\_\{A\}is bounded almost surely forA∼𝒜A\\sim\\mathcal\{A\}\. This will be the case in this work whenever this constant is used\.

### A\.3Abstract result

We now establish an abstract result that bounds \([A\.1](https://arxiv.org/html/2605.06861#A1.E1)\) at a high level of generality, namely, it makes very few assumptions on the distributionsℛ\\mathcal\{R\},𝒫\\mathcal\{P\},𝒜\\mathcal\{A\}andℰ\\mathcal\{E\}\. We do this with a series of lemmas that extend those inAdcock and Huang \[[2025](https://arxiv.org/html/2605.06861#bib.bib59)\]to the Hilbert space setting\.

###### Lemma A\.1\(Separation lemma\)\.

Letℋ1,…,ℋk\\mathcal\{H\}\_\{1\},\\ldots,\\mathcal\{H\}\_\{k\}be Borel probability measures on𝕏\\mathbb\{X\}and consider the mixtureℋ=∑i=1kaiℋi\\mathcal\{H\}=\\sum\_\{i=1\}^\{k\}a\_\{i\}\\mathcal\{H\}\_\{i\}\. Letg∗∼ℋg^\{\*\}\\sim\\mathcal\{H\}andg^∼∑i=1kℙ\(g∗∼ℋi\|g∗\)ℋi\(⋅\|g∗\)\\hat\{g\}\\sim\\sum\_\{i=1\}^\{k\}\\mathbb\{P\}\(g^\{\*\}\\sim\\mathcal\{H\}\_\{i\}\|g^\{\*\}\)\\mathcal\{H\}\_\{i\}\(\\cdot\|g^\{\*\}\)whereℙ\(g∗∼ℋi\|g∗\)\\mathbb\{P\}\(g^\{\*\}\\sim\\mathcal\{H\}\_\{i\}\|g^\{\*\}\)are the posterior weights\. Then

ℙ\[y∗∼ℋi,y^∼ℋj\(⋅\|y∗\)\]≤1−TV\(ℋi,ℋj\)\.\\mathbb\{P\}\[y^\{\*\}\\sim\\mathcal\{H\}\_\{i\},\\hat\{y\}\\sim\\mathcal\{H\}\_\{j\}\(\\cdot\|y^\{\*\}\)\]\\leq 1\-\\mathrm\{TV\}\(\\mathcal\{H\}\_\{i\},\\mathcal\{H\}\_\{j\}\)\.

The proof of this lemma is identical to\[Adcock and Huang,[2025](https://arxiv.org/html/2605.06861#bib.bib59), Lem\. C\.4\], as it does not make any assumptions on the domain of the probability measures\.

###### Lemma A\.2\(Disjointly\-supported measures induce well\-separated measurement distributions\)\.

Letf~∈𝕏\\tilde\{f\}\\in\\mathbb\{X\},σ≥0\\sigma\\geq 0,η≥0\\eta\\geq 0,c≥1c\\geq 1,𝒫𝖾𝗑𝗍\\mathcal\{P\}\_\{\\mathsf\{ext\}\}be a distribution supported in the set

Sf~,𝖾𝗑𝗍=\{f∈𝕏:‖f−f~‖≥c\(η\+σ\)\}S\_\{\\tilde\{f\},\\mathsf\{ext\}\}=\\\{f\\in\\mathbb\{X\}:\{\\\|f\-\\tilde\{f\}\\\|\}\\geq c\(\\eta\+\\sigma\)\\\}and𝒫𝗂𝗇𝗍\\mathcal\{P\}\_\{\\mathsf\{int\}\}be a distribution supported in the set

Sf~,𝗂𝗇𝗍=\{f∈𝕏:‖f−f~‖≤η\}\.S\_\{\\tilde\{f\},\\mathsf\{int\}\}=\\\{f\\in\\mathbb\{X\}:\{\\\|f\-\\tilde\{f\}\\\|\}\\leq\\eta\\\}\.GivenA∈ℒ\(𝕏0,ℝm\)A\\in\\mathcal\{L\}\(\\mathbb\{X\}\_\{0\},\\mathbb\{R\}^\{m\}\), letℋ𝗂𝗇𝗍,A\\mathcal\{H\}\_\{\\mathsf\{int\},A\}be the distribution ofy=A\(f∗\)\+ey=A\(f^\{\*\}\)\+ewheref∗∼𝒫𝗂𝗇𝗍f^\{\*\}\\sim\\mathcal\{P\}\_\{\\mathsf\{int\}\}ande∼ℰAe\\sim\\mathcal\{E\}\_\{A\}independently, and defineℋ𝖾𝗑𝗍,A\\mathcal\{H\}\_\{\\mathsf\{ext\},A\}in a similar way\. Then

𝔼A∼𝒜\[TV\(ℋ𝗂𝗇𝗍,A,ℋ𝖾𝗑𝗍,A\)\]≥1−\[C𝗅𝗈𝗐\(2c;𝒜,D𝖾𝗑𝗍\)\+C𝗎𝗉𝗉\(c2;𝒜,D𝗂𝗇𝗍\)\+2D𝗎𝗉𝗉\(cσ2;ℰ\)\],\\mathbb\{E\}\_\{A\\sim\\mathcal\{A\}\}\[\\mathrm\{TV\}\(\\mathcal\{H\}\_\{\\mathsf\{int\},A\},\\mathcal\{H\}\_\{\\mathsf\{ext\},A\}\)\]\\geq 1\-\\left\[C\_\{\\mathsf\{low\}\}\\left\(\\frac\{2\}\{\\sqrt\{c\}\};\\mathcal\{A\},D\_\{\\mathsf\{ext\}\}\\right\)\+C\_\{\\mathsf\{upp\}\}\\left\(\\frac\{\\sqrt\{c\}\}\{2\};\\mathcal\{A\},D\_\{\\mathsf\{int\}\}\\right\)\+2D\_\{\\mathsf\{upp\}\}\\left\(\\frac\{\\sqrt\{c\}\\sigma\}\{2\};\\mathcal\{E\}\\right\)\\right\],whereD𝖾𝗑𝗍=\{f−f~:f∈supp\(𝒫𝖾𝗑𝗍\)\}D\_\{\\mathsf\{ext\}\}=\\\{f\-\\tilde\{f\}:f\\in\\mathrm\{supp\}\(\\mathcal\{P\}\_\{\\mathsf\{ext\}\}\)\\\}andD𝗂𝗇𝗍=\{f−f~:f∈supp\(𝒫𝗂𝗇𝗍\)\}D\_\{\\mathsf\{int\}\}=\\\{f\-\\tilde\{f\}:f\\in\\mathrm\{supp\}\(\\mathcal\{P\}\_\{\\mathsf\{int\}\}\)\\\}\.

###### Proof\.

The proof is similar to that of\[Adcock and Huang,[2025](https://arxiv.org/html/2605.06861#bib.bib59), Lem\. C\.5\], except with the changes that come from working in general Hilbert spaces and not assuming independence ofAAandee\. We detail these changes\.

We first define the set

BA=\{y∈ℝm:‖y−Af~‖≤c\(η\+σ\)\}B\_\{A\}=\\\{y\\in\\mathbb\{R\}^\{m\}:\{\\\|y\-A\\tilde\{f\}\\\|\}\\leq\\sqrt\{c\}\(\\eta\+\\sigma\)\\\}and observe that

𝔼A∼𝒜\[TV\(ℋ𝗂𝗇𝗍,A,ℋ𝖾𝗑𝗍,A\)\]≥𝔼A∼𝒜\[ℋ𝗂𝗇𝗍,A\(BA\)\]−𝔼A∼𝒜\[ℋ𝖾𝗑𝗍,A\(BA\)\]\.\\mathbb\{E\}\_\{A\\sim\\mathcal\{A\}\}\[\\mathrm\{TV\}\(\\mathcal\{H\}\_\{\\mathsf\{int\},A\},\\mathcal\{H\}\_\{\\mathsf\{ext\},A\}\)\]\\geq\\mathbb\{E\}\_\{A\\sim\\mathcal\{A\}\}\[\\mathcal\{H\}\_\{\\mathsf\{int\},A\}\(B\_\{A\}\)\]\-\\mathbb\{E\}\_\{A\\sim\\mathcal\{A\}\}\[\\mathcal\{H\}\_\{\\mathsf\{ext\},A\}\(B\_\{A\}\)\]\.\(A\.2\)Consider the first term\. We write

𝔼A∼𝒜\[ℋ𝖾𝗑𝗍,A\(BA\)\]=I1\+I2,\\mathbb\{E\}\_\{A\\sim\\mathcal\{A\}\}\[\\mathcal\{H\}\_\{\\mathsf\{ext\},A\}\(B\_\{A\}\)\]=I\_\{1\}\+I\_\{2\},Givenf∈𝕏f\\in\\mathbb\{X\}, letCf=\{A:‖A\(f\)−A\(f~\)‖<2c\(η\+σ\)\}⊆ℒ\(𝕏0,ℝm\)C\_\{f\}=\\\{A:\{\\\|A\(f\)\-A\(\\tilde\{f\}\)\\\|\}<2\\sqrt\{c\}\(\\eta\+\\sigma\)\\\}\\subseteq\\mathcal\{L\}\(\\mathbb\{X\}\_\{0\},\\mathbb\{R\}^\{m\}\)and write

I1=𝔼f∼𝒫𝖾𝗑𝗍\[𝔼A∼𝒜ℰ\(BA−A\(f\)\)1Cx\],I2=𝔼f∼𝒫𝖾𝗑𝗍\[𝔼A∼𝒜ℰ\(BA−A\(f\)\)1Cxc\]\.I\_\{1\}=\\mathbb\{E\}\_\{f\\sim\\mathcal\{P\}\_\{\\mathsf\{ext\}\}\}\[\\mathbb\{E\}\_\{A\\sim\\mathcal\{A\}\}\\mathcal\{E\}\(B\_\{A\}\-A\(f\)\)1\_\{C\_\{x\}\}\],\\quad I\_\{2\}=\\mathbb\{E\}\_\{f\\sim\\mathcal\{P\}\_\{\\mathsf\{ext\}\}\}\[\\mathbb\{E\}\_\{A\\sim\\mathcal\{A\}\}\\mathcal\{E\}\(B\_\{A\}\-A\(f\)\)1\_\{C\_\{x\}^\{c\}\}\]\.Identically to\[Adcock and Huang,[2025](https://arxiv.org/html/2605.06861#bib.bib59), Lem\. C\.5\], we haveI1≤C𝗅𝗈𝗐\(2/c;𝒜,D𝖾𝗑𝗍\)I\_\{1\}\\leq C\_\{\\mathsf\{low\}\}\(2/\\sqrt\{c\};\\mathcal\{A\},D\_\{\\mathsf\{ext\}\}\)\. ForI2I\_\{2\}, we write

I2≤𝔼f∼𝒫𝖾𝗑𝗍\[𝔼A∼𝒜\[ℰ\(BA−A\(f\)\|A\)\]\.I\_\{2\}\\leq\\mathbb\{E\}\_\{f\\sim\\mathcal\{P\}\_\{\\mathsf\{ext\}\}\}\\left\[\\mathbb\{E\}\_\{A\\sim\\mathcal\{A\}\}\[\\mathcal\{E\}\(B\_\{A\}\-A\(f\)\|A\)\\right\]\.Observe thatBA−A\(f\)⊆Bc\(η\+σ\)cB\_\{A\}\-A\(f\)\\subseteq B^\{c\}\_\{\\sqrt\{c\}\(\\eta\+\\sigma\)\}\. Hence, by the tower property,

I2≤𝔼f∼𝒫𝖾𝗑𝗍ℰ\(Bc\(η\+σ\)c\)≤D𝗎𝗉𝗉\(σc;ℳ\)\.I\_\{2\}\\leq\\mathbb\{E\}\_\{f\\sim\\mathcal\{P\}\_\{\\mathsf\{ext\}\}\}\\mathcal\{E\}\(B^\{c\}\_\{\\sqrt\{c\}\(\\eta\+\\sigma\)\}\)\\leq D\_\{\\mathsf\{upp\}\}\(\\sigma\\sqrt\{c\};\\mathcal\{M\}\)\.We deduce that

𝔼A∼𝒜\[ℋ𝖾𝗑𝗍,A\(BA\)\]≤C𝗅𝗈𝗐\(2c;𝒜,D𝖾𝗑𝗍\)\+D𝗎𝗉𝗉\(c2σ;ℰ\)\\mathbb\{E\}\_\{A\\sim\\mathcal\{A\}\}\[\\mathcal\{H\}\_\{\\mathsf\{ext\},A\}\(B\_\{A\}\)\]\\leq C\_\{\\mathsf\{low\}\}\\left\(\\frac\{2\}\{\\sqrt\{c\}\};\\mathcal\{A\},D\_\{\\mathsf\{ext\}\}\\right\)\+D\_\{\\mathsf\{upp\}\}\\left\(\\frac\{\\sqrt\{c\}\}\{2\}\\sigma;\\mathcal\{E\}\\right\)The argument for the other term of \([A\.2](https://arxiv.org/html/2605.06861#A1.E2)\) follows similar arguments\. We omit the details\. ∎

###### Lemma A\.3\(Replacing the real distribution with the approximate distribution\)\.

Letε,σ,d,t≥0\\varepsilon,\\sigma,d,t\\geq 0,c≥1c\\geq 1,Π\\Pibe anW∞W\_\{\\infty\}\-optimal coupling ofℛ\\mathcal\{R\}and𝒫\\mathcal\{P\}and define the setD=\{f∗−g∗:\(f∗,g∗\)∈supp\(Π\)\}D=\\\{f^\{\*\}\-g^\{\*\}:\(f^\{\*\},g^\{\*\}\)\\in\\mathrm\{supp\}\(\\Pi\)\\\}\. Let

p=ℙf∗∼ℛ,\(A,e\)∼ℳ,f^∼𝒫\(⋅\|A\(f∗\)\+e,A\)\[‖f∗−f^‖≥d\+ε\]p=\\mathbb\{P\}\_\{f^\{\*\}\\sim\\mathcal\{R\},\(A,e\)\\sim\\mathcal\{M\},\\hat\{f\}\\sim\\mathcal\{P\}\(\\cdot\|A\(f^\{\*\}\)\+e,A\)\}\[\{\\\|f^\{\*\}\-\\hat\{f\}\\\|\}\\geq d\+\\varepsilon\]and

q=ℙg∗∼𝒫,\(A,e\)∼ℳ,g^∼𝒫\(⋅\|A\(g∗\)\+e,A\)\[‖g∗−g^‖≥d\]\.q=\\mathbb\{P\}\_\{g^\{\*\}\\sim\\mathcal\{P\},\(A,e\)\\sim\\mathcal\{M\},\\hat\{g\}\\sim\\mathcal\{P\}\(\\cdot\|A\(g^\{\*\}\)\+e,A\)\}\[\{\\\|g^\{\*\}\-\\hat\{g\}\\\|\}\\geq d\]\.Then

p≤C𝖺𝖻𝗌\(ε,tε;𝒜,D\)\+D𝗎𝗉𝗉\(cσ;ℰ\)\+D𝗌𝗁𝗂𝖿𝗍\(tε,cσ;ℳ\)q\.p\\leq C\_\{\\mathsf\{abs\}\}\(\\varepsilon,t\\varepsilon;\\mathcal\{A\},D\)\+D\_\{\\mathsf\{upp\}\}\(c\\sigma;\\mathcal\{E\}\)\+D\_\{\\mathsf\{shift\}\}\(t\\varepsilon,c\\sigma;\\mathcal\{M\}\)q\.

###### Proof\.

Similarly to the previous proofs, the arguments are broadly similar to those of\[Adcock and Huang,[2025](https://arxiv.org/html/2605.06861#bib.bib59), Lem\. C\.6\], with the necessary changes to account for the more general setup\. Define the events

B1,f^=\{f∗:‖f∗−f^‖≥d\+ε\},B2,f^=\{g∗:‖g^−g∗‖≥d\}B\_\{1,\\hat\{f\}\}=\\\{f^\{\*\}:\{\\\|f^\{\*\}\-\\hat\{f\}\\\|\}\\geq d\+\\varepsilon\\\},\\quad B\_\{2,\\hat\{f\}\}=\\\{g^\{\*\}:\{\\\|\\hat\{g\}\-g^\{\*\}\\\|\}\\geq d\\\}so that

p\\displaystyle p=ℙf∗∼ℛ,\(A,e\)∼ℳ,f^∼𝒫\(⋅\|A\(f∗\)\+e,A\)\[f∗∈B1,f^\]\\displaystyle=\\mathbb\{P\}\_\{f^\{\*\}\\sim\\mathcal\{R\},\(A,e\)\\sim\\mathcal\{M\},\\hat\{f\}\\sim\\mathcal\{P\}\(\\cdot\|A\(f^\{\*\}\)\+e,A\)\}\[f^\{\*\}\\in B\_\{1,\\hat\{f\}\}\]q\\displaystyle q=ℙg∗∼𝒫,\(A,e\)∼ℳ,g^∼𝒫\(⋅\|A\(g∗\)\+e,A\)\[g∗∈B2,g^\]\.\\displaystyle=\\mathbb\{P\}\_\{g^\{\*\}\\sim\\mathcal\{P\},\(A,e\)\\sim\\mathcal\{M\},\\hat\{g\}\\sim\\mathcal\{P\}\(\\cdot\|A\(g^\{\*\}\)\+e,A\)\}\[g^\{\*\}\\in B\_\{2,\\hat\{g\}\}\]\.With the assumption thatW∞\(ℛ,𝒫\)≤εW\_\{\\infty\}\(\\mathcal\{R\},\\mathcal\{P\}\)\\leq\\varepsilon, there exists a coupling such thatΠ\(‖f∗−g∗‖≤ε\)=1\\Pi\(\{\\\|f^\{\*\}\-g^\{\*\}\\\|\}\\leq\\varepsilon\)=1\. This gives

p=∫∫∫∫1B1,f^\(f∗\)d𝒫\(⋅\|A\(f∗\)\+e,A\)\(f^\)dℳ\(A,e\)dΠ\(f∗,u∗\)\.\\displaystyle p=\\int\\int\\int\\int 1\_\{B\_\{1,\\hat\{f\}\}\}\(f^\{\*\}\)\\,\\mathrm\{d\}\\mathcal\{P\}\(\\cdot\|A\(f^\{\*\}\)\+e,A\)\(\\hat\{f\}\)\\,\\mathrm\{d\}\\mathcal\{M\}\(A,e\)\\,\\mathrm\{d\}\\Pi\(f^\{\*\},u^\{\*\}\)\.DefineE=\{\(f∗,g∗\):‖f∗−g∗‖≤ε\}E=\\\{\(f^\{\*\},g^\{\*\}\):\{\\\|f^\{\*\}\-g^\{\*\}\\\|\}\\leq\\varepsilon\\\}and observeΠ\(E\)=1\\Pi\(E\)=1\. Then for fixedA,eA,e, and\(f∗,g∗\)∈E\(f^\{\*\},g^\{\*\}\)\\in E, we have

∫1B1,f^\(f∗\)d𝒫\(⋅\|A\(f∗\)\+e,A\)\(f^\)≤∫1B2,f^\(g∗\)d𝒫\(⋅\|A\(f∗\)\+e,A\)\(f^\)\.\\int 1\_\{B\_\{1,\\hat\{f\}\}\}\(f^\{\*\}\)\\,\\mathrm\{d\}\\mathcal\{P\}\(\\cdot\|A\(f^\{\*\}\)\+e,A\)\(\\hat\{f\}\)\\leq\\int 1\_\{B\_\{2,\\hat\{f\}\}\}\(g^\{\*\}\)\\,\\mathrm\{d\}\\mathcal\{P\}\(\\cdot\|A\(f^\{\*\}\)\+e,A\)\(\\hat\{f\}\)\.By Fubini’s theorem, which we note we can invoke by non\-negativity of indicator functions, we get

p\\displaystyle p=∫∫∫∫1B1,f^\(f∗\)d𝒫\(⋅\|A\(f∗\)\+e,A\)\(f^\)dℳ\(A,e\)dΠ\(f∗,g∗\)\\displaystyle=\\int\\int\\int\\int 1\_\{B\_\{1,\\hat\{f\}\}\}\(f^\{\*\}\)\\,\\mathrm\{d\}\\mathcal\{P\}\(\\cdot\|A\(f^\{\*\}\)\+e,A\)\(\\hat\{f\}\)\\,\\mathrm\{d\}\\mathcal\{M\}\(A,e\)\\,\\mathrm\{d\}\\Pi\(f^\{\*\},g^\{\*\}\)≤∫∫∫∫1B2,f^\(g∗\)d𝒫\(⋅\|A\(f∗\)\+e,A\)\(f^\)dℳ\(A,e\)dΠ\(f∗,g∗\)\.\\displaystyle\\leq\\int\\int\\int\\int 1\_\{B\_\{2,\\hat\{f\}\}\}\(g^\{\*\}\)\\,\\mathrm\{d\}\\mathcal\{P\}\(\\cdot\|A\(f^\{\*\}\)\+e,A\)\(\\hat\{f\}\)\\,\\mathrm\{d\}\\mathcal\{M\}\(A,e\)\\,\\mathrm\{d\}\\Pi\(f^\{\*\},g^\{\*\}\)\.DefineCf∗,g∗=\{A:‖A\(f∗−g∗\)‖\>tε\}C\_\{f^\{\*\},g^\{\*\}\}=\\\{A:\{\\\|A\(f^\{\*\}\-g^\{\*\}\)\\\|\}\>t\\varepsilon\\\}and

I1\\displaystyle I\_\{1\}=∫∫1Cf∗,g∗\(A\)∫∫1B2,f^\(g∗\)d𝒫\(⋅\|A\(f∗\)\+e,A\)\(f^\)dℳ\(A,e\)dΠ\(f∗,g∗\)\\displaystyle=\\int\\int 1\_\{C\_\{f^\{\*\},g^\{\*\}\}\}\(A\)\\int\\int 1\_\{B\_\{2,\\hat\{f\}\}\}\(g^\{\*\}\)\\,\\mathrm\{d\}\\mathcal\{P\}\(\\cdot\|A\(f^\{\*\}\)\+e,A\)\(\\hat\{f\}\)\\,\\mathrm\{d\}\\mathcal\{M\}\(A,e\)\\,\\mathrm\{d\}\\Pi\(f^\{\*\},g^\{\*\}\)I2\\displaystyle I\_\{2\}=∫∫1Cf∗,g∗c\(A\)∫∫1B2,f^\(g∗\)d𝒫\(⋅\|A\(f∗\)\+e,A\)\(f^\)dℳ\(A,e\)dΠ\(f∗,g∗\)\\displaystyle=\\int\\int 1\_\{C\_\{f^\{\*\},g^\{\*\}\}^\{c\}\}\(A\)\\int\\int 1\_\{B\_\{2,\\hat\{f\}\}\}\(g^\{\*\}\)\\,\\mathrm\{d\}\\mathcal\{P\}\(\\cdot\|A\(f^\{\*\}\)\+e,A\)\(\\hat\{f\}\)\\,\\mathrm\{d\}\\mathcal\{M\}\(A,e\)\\,\\mathrm\{d\}\\Pi\(f^\{\*\},g^\{\*\}\)so that

p≤I1\+I2\.p\\leq I\_\{1\}\+I\_\{2\}\.The bound forI1I\_\{1\}goes identically to\[Adcock and Huang,[2025](https://arxiv.org/html/2605.06861#bib.bib59), Lem C\.3\]and givesI1≤C𝖺𝖻𝗌\(ε,tε;𝒜,D\)I\_\{1\}\\leq C\_\{\\mathsf\{abs\}\}\(\\varepsilon,t\\varepsilon;\\mathcal\{A\},D\)\. To boundI2I\_\{2\}, we further split the integral as follows: we writeI2=I21\+I22I\_\{2\}=I\_\{2\_\{1\}\}\+I\_\{2\_\{2\}\}, where

I21=\\displaystyle I\_\{2\_\{1\}\}=∫∫1Cf∗,g∗c\(A\)∫1Bcσc\(e\)∫1B2,f^\(g∗\)d𝒫\(⋅\|A\(f∗\)\+e,A\)\(f^\)dℳ\(A,e\)dΠ\(f∗,g∗\)\\displaystyle\\int\\int 1\_\{C\_\{f^\{\*\},g^\{\*\}\}^\{c\}\}\(A\)\\int 1\_\{B^\{c\}\_\{c\\sigma\}\}\(e\)\\int 1\_\{B\_\{2,\\hat\{f\}\}\}\(g^\{\*\}\)\\,\\mathrm\{d\}\\mathcal\{P\}\(\\cdot\|A\(f^\{\*\}\)\+e,A\)\(\\hat\{f\}\)\\,\\mathrm\{d\}\\mathcal\{M\}\(A,e\)\\,\\mathrm\{d\}\\Pi\(f^\{\*\},g^\{\*\}\)I22=\\displaystyle I\_\{2\_\{2\}\}=∫∫1Cf∗,g∗c\(A\)∫1Bcσ\(e\)∫1B2,f^\(g∗\)d𝒫\(⋅\|A\(f∗\)\+e,A\)\(f^\)dℳ\(A,e\)dΠ\(f∗,g∗\)\\displaystyle\\int\\int 1\_\{C\_\{f^\{\*\},g^\{\*\}\}^\{c\}\}\(A\)\\int 1\_\{B\_\{c\\sigma\}\}\(e\)\\int 1\_\{B\_\{2,\\hat\{f\}\}\}\(g^\{\*\}\)\\,\\mathrm\{d\}\\mathcal\{P\}\(\\cdot\|A\(f^\{\*\}\)\+e,A\)\(\\hat\{f\}\)\\,\\mathrm\{d\}\\mathcal\{M\}\(A,e\)\\,\\mathrm\{d\}\\Pi\(f^\{\*\},g^\{\*\}\)To find a bound forI21I\_\{2\_\{1\}\}, we use a similar argument to Lemma[A\.2](https://arxiv.org/html/2605.06861#A1.Thmtheorem2)\. We have

I21\\displaystyle I\_\{2\_\{1\}\}≤∫∫1Cf∗,g∗c\(A\)∫1Bcσc\(e\)dℰA\(e\)d𝒜\(A\)dΠ\(f∗,g∗\)\\displaystyle\\leq\\int\\int 1\_\{C\_\{f^\{\*\},g^\{\*\}\}^\{c\}\}\(A\)\\int 1\_\{B^\{c\}\_\{c\\sigma\}\}\(e\)\\,\\mathrm\{d\}\\mathcal\{E\}\_\{A\}\(e\)\\,\\mathrm\{d\}\\mathcal\{A\}\(A\)\\,\\mathrm\{d\}\\Pi\(f^\{\*\},g^\{\*\}\)≤∫∫1Bcσc\(e\)dℰA\(e\)d𝒜\(A\),\\displaystyle\\leq\\int\\int 1\_\{B^\{c\}\_\{c\\sigma\}\}\(e\)\\,\\mathrm\{d\}\\mathcal\{E\}\_\{A\}\(e\)\\,\\mathrm\{d\}\\mathcal\{A\}\(A\),and therefore, by the tower property,

I21=𝔼A∼𝒜\[ℰA\(Bcσc\)\]=ℰ\(Bcσc\)≤D𝗎𝗉𝗉\(cσ;ℰ\)\.I\_\{2\_\{1\}\}=\\mathbb\{E\}\_\{A\\sim\\mathcal\{A\}\}\[\\mathcal\{E\}\_\{A\}\(B\_\{c\\sigma\}^\{c\}\)\]=\\mathcal\{E\}\(B\_\{c\\sigma\}^\{c\}\)\\leq D\_\{\\mathsf\{upp\}\}\(c\\sigma;\\mathcal\{E\}\)\.Now considerI22I\_\{2\_\{2\}\}\. For fixedA∈Cf∗,g∗cA\\in C^\{c\}\_\{f^\{\*\},g^\{\*\}\}, define the random variablee′=e\+A\(f∗−g∗\)e^\{\\prime\}=e\+A\(f^\{\*\}\-g^\{\*\}\)\. Suppose, without loss of generality, thatD𝗌𝗁𝗂𝖿𝗍\(tε,cσ;ℳ\)<\+∞D\_\{\\mathsf\{shift\}\}\(t\\varepsilon,c\\sigma;\\mathcal\{M\}\)<\+\\infty\. Then the Radon\-Nikodym derivativedℰA/dℰA′\(e\)\\,\\mathrm\{d\}\\mathcal\{E\}\_\{A\}/\\,\\mathrm\{d\}\\mathcal\{E\}^\{\\prime\}\_\{A\}\(e\)exists and is finite fore∈Bcσe\\in B\_\{c\\sigma\}, sincev=A\(f∗−u∗\)=e−e′v=A\(f^\{\*\}\-u^\{\*\}\)=e\-e^\{\\prime\}satisfies‖v‖≤tε\{\\\|v\\\|\}\\leq t\\varepsilon\. Therefore

I22≤\\displaystyle I\_\{2\_\{2\}\}\\leqD𝗌𝗁𝗂𝖿𝗍\(tε,cσ;ℳ\)∫∫1Cf∗,g∗c\(A\)∫1Bcσ\(ω\)\\displaystyle~D\_\{\\mathsf\{shift\}\}\(t\\varepsilon,c\\sigma;\\mathcal\{M\}\)\\int\\int 1\_\{C\_\{f^\{\*\},g^\{\*\}\}^\{c\}\}\(A\)\\int 1\_\{B\_\{c\\sigma\}\}\(\\omega\)×∫1B2,f^\(g∗\)d𝒫\(⋅\|A\(f∗\)\+e,A\)\(f^\)dℰA′\(e\)d𝒜\(A\)dΠ\(f∗,g∗\)\.\\displaystyle\\times\\int 1\_\{B\_\{2,\\hat\{f\}\}\}\(g^\{\*\}\)d\\mathcal\{P\}\(\\cdot\|A\(f^\{\*\}\)\+e,A\)\(\\hat\{f\}\)\\,\\mathrm\{d\}\\mathcal\{E\}\_\{A\}^\{\\prime\}\(e\)\\,\\mathrm\{d\}\\mathcal\{A\}\(A\)\\,\\mathrm\{d\}\\Pi\(f^\{\*\},g^\{\*\}\)\.We now change variables back withe=e′−A\(f∗−u∗\)e=e^\{\\prime\}\-A\(f^\{\*\}\-u^\{\*\}\)to get

I22=\\displaystyle I\_\{2\_\{2\}\}=D𝗌𝗁𝗂𝖿𝗍\(tε,cσ;ℳ\)∫∫1Cf∗,g∗c\(A\)∫1Bcσ\(e′−A\(f∗−g∗\)\)\\displaystyle~D\_\{\\mathsf\{shift\}\}\(t\\varepsilon,c\\sigma;\\mathcal\{M\}\)\\int\\int 1\_\{C\_\{f^\{\*\},g^\{\*\}\}^\{c\}\}\(A\)\\int 1\_\{B\_\{c\\sigma\}\}\(e^\{\\prime\}\-A\(f^\{\*\}\-g^\{\*\}\)\)×∫1B2,f^\(g∗\)d𝒫\(⋅\|A\(g∗\)\+e′,A\)\(f^\)dℰA\(e′\)d𝒜\(A\)dΠ\(f∗,g∗\)\.\\displaystyle\\times\\int 1\_\{B\_\{2,\\hat\{f\}\}\}\(g^\{\*\}\)d\\mathcal\{P\}\(\\cdot\|A\(g^\{\*\}\)\+e^\{\\prime\},A\)\(\\hat\{f\}\)\\,\\mathrm\{d\}\\mathcal\{E\}\_\{A\}\(e^\{\\prime\}\)\\,\\mathrm\{d\}\\mathcal\{A\}\(A\)\\,\\mathrm\{d\}\\Pi\(f^\{\*\},g^\{\*\}\)\.Next we replace the variablesf^\\hat\{f\}ande′e^\{\\prime\}withg^\\hat\{g\}andeerespectively to get

I22=\\displaystyle I\_\{2\_\{2\}\}=D𝗌𝗁𝗂𝖿𝗍\(tε,cσ;ℳ\)∫∫1Cf∗,g∗c\(A\)∫1Bcσ\(e−A\(f∗−g∗\)\)\\displaystyle D\_\{\\mathsf\{shift\}\}\(t\\varepsilon,c\\sigma;\\mathcal\{M\}\)\\int\\int 1\_\{C\_\{f^\{\*\},g^\{\*\}\}^\{c\}\}\(A\)\\int 1\_\{B\_\{c\\sigma\}\}\(e\-A\(f^\{\*\}\-g^\{\*\}\)\)×∫1B2,g^\(g∗\)d𝒫\(⋅\|A\(g∗\)\+ω,A\)\(g^\)dℰA\(e\)d𝒜\(A\)dΠ\(f∗,g∗\)\\displaystyle\\times\\int 1\_\{B\_\{2,\\hat\{g\}\}\}\(g^\{\*\}\)d\\mathcal\{P\}\(\\cdot\|A\(g^\{\*\}\)\+\\omega,A\)\(\\hat\{g\}\)\\,\\mathrm\{d\}\\mathcal\{E\}\_\{A\}\(e\)\\,\\mathrm\{d\}\\mathcal\{A\}\(A\)\\,\\mathrm\{d\}\\Pi\(f^\{\*\},g^\{\*\}\)and finally, we replace the inner term withqqto get

I22≤D𝗌𝗁𝗂𝖿𝗍\(tε,cσ;ℳ\)q\.I\_\{2\_\{2\}\}\\leq D\_\{\\mathsf\{shift\}\}\(t\\varepsilon,c\\sigma;\\mathcal\{M\}\)q\.Combining gives with the estimates forI1I\_\{1\}andI21I\_\{2\_\{1\}\}yields the result\. ∎

The following lemma is virtually identical to\[Adcock and Huang,[2025](https://arxiv.org/html/2605.06861#bib.bib59), Lem\. C\.7\], the only difference being that the distributions now take values in𝕏\\mathbb\{X\}\. The proof is identical, as it only requires𝕏\\mathbb\{X\}to be a Polish space, which holds by assumption as𝕏\\mathbb\{X\}is a separable Hilbert space\.

###### Lemma A\.4\(Decomposing distributions\)\.

Letℛ,𝒫\\mathcal\{R\},\\mathcal\{P\}be arbitrary distributions on a separable Hilbert space𝕏\\mathbb\{X\},p≥1p\\geq 1andη,ρ,δ\>0\\eta,\\rho,\\delta\>0\. IfWp\(ℛ,𝒫\)≤ρW\_\{p\}\(\\mathcal\{R\},\\mathcal\{P\}\)\\leq\\rhoandk∈ℕk\\in\\mathbb\{N\}is such that

min⁡\{log⁡Covη,δ\(𝒫\),log⁡Covη,δ\(ℛ\)\}≤k,\\min\\\{\\log\\mathrm\{Cov\}\_\{\\eta,\\delta\}\(\\mathcal\{P\}\),\\log\\mathrm\{Cov\}\_\{\\eta,\\delta\}\(\\mathcal\{R\}\)\\\}\\leq k,\(A\.3\)then there exist distributionsℛ′,ℛ′′,𝒫′,𝒫′′\\mathcal\{R\}^\{\\prime\},\\mathcal\{R\}^\{\\prime\\prime\},\\mathcal\{P\}^\{\\prime\},\\mathcal\{P\}^\{\\prime\\prime\}, a constant0<δ′≤δ0<\\delta^\{\\prime\}\\leq\\deltaand a discrete distribution𝒬\\mathcal\{Q\}withsupp\(𝒬\)=S\\mathrm\{supp\}\(\\mathcal\{Q\}\)=Ssatisfying

1. 1\.min⁡\{W∞\(𝒫′,𝒬\),W∞\(ℛ′,𝒬\)\}≤η\\min\\\{W\_\{\\infty\}\(\\mathcal\{P\}^\{\\prime\},\\mathcal\{Q\}\),W\_\{\\infty\}\(\\mathcal\{R\}^\{\\prime\},\\mathcal\{Q\}\)\\\}\\leq\\eta,
2. 2\.W∞\(ℛ′,𝒫′\)≤ρδ1/pW\_\{\\infty\}\(\\mathcal\{R\}^\{\\prime\},\\mathcal\{P\}^\{\\prime\}\)\\leq\\frac\{\\rho\}\{\\delta^\{1/p\}\},
3. 3\.𝒫=\(1−2δ′\)𝒫′\+\(2δ′\)𝒫′′\\mathcal\{P\}=\(1\-2\\delta^\{\\prime\}\)\\mathcal\{P\}^\{\\prime\}\+\(2\\delta^\{\\prime\}\)\\mathcal\{P\}^\{\\prime\\prime\}andℛ=\(1−2δ′\)ℛ′\+\(2δ′\)ℛ′′\\mathcal\{R\}=\(1\-2\\delta^\{\\prime\}\)\\mathcal\{R\}^\{\\prime\}\+\(2\\delta^\{\\prime\}\)\\mathcal\{R\}^\{\\prime\\prime\},
4. 4\.\|S\|≤ek\|S\|\\leq\\mathrm\{e\}^\{k\},
5. 5\.andS⊆supp\(𝒫\)S\\subseteq\\mathrm\{supp\}\(\\mathcal\{P\}\)if𝒫\\mathcal\{P\}attains the minimum in \([A\.3](https://arxiv.org/html/2605.06861#A1.E3)\) withS⊆supp\(ℛ\)S\\subseteq\\mathrm\{supp\}\(\\mathcal\{R\}\)otherwise\.

We can now establish the following abstract result, which extends\[Adcock and Huang,[2025](https://arxiv.org/html/2605.06861#bib.bib59), Thm\. 3\.1\]to the Hilbert spaces\.

###### Theorem A\.5\.

Let1≤p≤∞1\\leq p\\leq\\infty,0≤δ≤1/40\\leq\\delta\\leq 1/4,ε,η,t\>0\\varepsilon,\\eta,t\>0,c,c′≥1c,c^\{\\prime\}\\geq 1andσ≥ε/δ1/p\\sigma\\geq\\varepsilon/\\delta^\{1/p\}\. Letℰ\\mathcal\{E\}be a distribution onℝm\\mathbb\{R\}^\{m\}with andℛ,𝒫\\mathcal\{R\},\\mathcal\{P\}be distributions in𝕏\\mathbb\{X\}satisfyingWp\(ℛ,𝒫\)≤εW\_\{p\}\(\\mathcal\{R\},\\mathcal\{P\}\)\\leq\\varepsilonand

min⁡\(log⁡Covη,δ\(ℛ\),log⁡Covη,δ\(𝒫\)\)≤k\\min\(\\log\\mathrm\{Cov\}\_\{\\eta,\\delta\}\(\\mathcal\{R\}\),\\log\\mathrm\{Cov\}\_\{\\eta,\\delta\}\(\\mathcal\{P\}\)\)\\leq k\(A\.4\)for somek∈ℕk\\in\\mathbb\{N\}\. Suppose thatf∗∼ℛf^\{\*\}\\sim\\mathcal\{R\},\(A,e\)∼ℳ\(A,e\)\\sim\\mathcal\{M\}independently andf^∼𝒫\(⋅\|y,A\)\\hat\{f\}\\sim\\mathcal\{P\}\(\\cdot\|y,A\), wherey=A\(f∗\)\+ey=A\(f^\{\*\}\)\+e\. Thenp:=ℙ\[‖f∗−f^‖≥\(c\+2\)\(η\+σ\)\]p:=\\mathbb\{P\}\[\{\\\|f^\{\*\}\-\\hat\{f\}\\\|\}\\geq\(c\+2\)\(\\eta\+\\sigma\)\]satisfies

p≤2δ\+C𝖺𝖻𝗌\(ε/δ1/p,tε/δ1/p;𝒜,D1\)\+D𝗎𝗉𝗉\(c′σ;ℳ\)\\displaystyle p\\leq 2\\delta\+C\_\{\\mathsf\{abs\}\}\(\\varepsilon/\\delta^\{1/p\},t\\varepsilon/\\delta^\{1/p\};\\mathcal\{A\},D\_\{1\}\)\+D\_\{\\mathsf\{upp\}\}\(c^\{\\prime\}\\sigma;\\mathcal\{M\}\)\+2D𝗌𝗁𝗂𝖿𝗍\(tε/δ1/p,c′σ;ℳ\)ek\[C𝗅𝗈𝗐\(22c;𝒜,D2\)\+C𝗎𝗉𝗉\(c22;𝒜,D2\)\+2D𝗎𝗉𝗉\(cσ22;ℳ\)\],\\displaystyle\+2D\_\{\\mathsf\{shift\}\}\(t\\varepsilon/\\delta^\{1/p\},c^\{\\prime\}\\sigma;\\mathcal\{M\}\)\\mathrm\{e\}^\{k\}\\Bigg\[C\_\{\\mathsf\{low\}\}\\left\(\\frac\{2\\sqrt\{2\}\}\{\\sqrt\{c\}\};\\mathcal\{A\},D\_\{2\}\\right\)\+C\_\{\\mathsf\{upp\}\}\\left\(\\frac\{\\sqrt\{c\}\}\{2\\sqrt\{2\}\};\\mathcal\{A\},D\_\{2\}\\right\)\+2D\_\{\\mathsf\{upp\}\}\\left\(\\frac\{\\sqrt\{c\}\\sigma\}\{2\\sqrt\{2\}\};\\mathcal\{M\}\\right\)\\Bigg\],where

D1=Bε/δ1/p\(supp\(𝒫\)\)∩supp\(ℛ\)−supp\(𝒫\)D\_\{1\}=B\_\{\\varepsilon/\\delta^\{1/p\}\}\(\\mathrm\{supp\}\(\\mathcal\{P\}\)\)\\cap\\mathrm\{supp\}\(\\mathcal\{R\}\)\-\\mathrm\{supp\}\(\\mathcal\{P\}\)\(A\.5\)and

D2=\{supp\(𝒫\)−supp\(𝒫\)if𝒫attains the minimum in \([A\.4](https://arxiv.org/html/2605.06861#A1.E4)\)supp\(ℛ\)−supp\(𝒫\)otherwise\.D\_\{2\}=\\begin\{cases\}\\mathrm\{supp\}\(\\mathcal\{P\}\)\-\\mathrm\{supp\}\(\\mathcal\{P\}\)&\\text\{if $\\mathcal\{P\}$ attains the minimum in \\eqref\{min\-cov\-k\-main\}\}\\\\ \\mathrm\{supp\}\(\\mathcal\{R\}\)\-\\mathrm\{supp\}\(\\mathcal\{P\}\)&\\text\{otherwise\}\\end\{cases\}\.\(A\.6\)

###### Proof\.

The proof is virtually identical, except for a couple of changes\. First, throughout the proof, we replaceA∼𝒜A\\sim\\mathcal\{A\},e∼ℰe\\sim\\mathcal\{E\}with\(A,e\)∼ℳ\(A,e\)\\sim\\mathcal\{M\}to account for the joint model considered in this work\. Second, before\[Adcock and Huang,[2025](https://arxiv.org/html/2605.06861#bib.bib59), Eqn\. \(C\.23\)\], we fixA∈ℒ\(𝕏0,ℝm\)A\\in\\mathcal\{L\}\(\\mathbb\{X\}\_\{0\},\\mathbb\{R\}^\{m\}\)and replacee∼ℰe\\sim\\mathcal\{E\}withe∼ℰAe\\sim\\mathcal\{E\}\_\{A\}\. ∎

### A\.4Proof of Theorem[3\.3](https://arxiv.org/html/2605.06861#S3.Thmtheorem3)

The proof of Theorem[3\.3](https://arxiv.org/html/2605.06861#S3.Thmtheorem3)is based on Theorem[A\.5](https://arxiv.org/html/2605.06861#A1.Thmtheorem5), after making a suitable choice for the distributionℳ\\mathcal\{M\}and estimating the various constants\. Specifically, consider the setup of §[3](https://arxiv.org/html/2605.06861#S3), suppose that the sampling distributionμ\\musatisfies the conditions in Theorem[3\.3](https://arxiv.org/html/2605.06861#S3.Thmtheorem3), i\.e\.,μ≪ρ\\mu\\ll\\rho,dμ/dρ\>0\\,\\mathrm\{d\}\\mu/\\,\\mathrm\{d\}\\rho\>0μ\\mu\-a\.s\. and letw\(θ\)=\(dμ\(θ\)/dρ\)−1w\(\\theta\)=\\left\(\\,\\mathrm\{d\}\\mu\(\\theta\)/\\,\\mathrm\{d\}\\rho\\right\)^\{\-1\}\. Then we define\(A,e\)∼ℳ\(A,e\)\\sim\\mathcal\{M\}as

A\(f\)=\(w\(θi\)mL\(θi\)\(f\)\)i=1m,∀f∈𝕏0A\(f\)=\\left\(\\sqrt\{\\frac\{w\(\\theta\_\{i\}\)\}\{m\}\}L\(\\theta\_\{i\}\)\(f\)\\right\)^\{m\}\_\{i=1\},\\quad\\forall f\\in\\mathbb\{X\}\_\{0\}and

e=\(w\(θi\)mni\)i=1m,e=\\left\(\\sqrt\{\\frac\{w\(\\theta\_\{i\}\)\}\{m\}\}n\_\{i\}\\right\)^\{m\}\_\{i=1\},whereθ=\(θi\)i=1m∼μ⊗m\\theta=\(\\theta\_\{i\}\)^\{m\}\_\{i=1\}\\sim\\mu^\{\\otimes m\}andn∼𝒩\(0,σ2I\)n\\sim\\mathcal\{N\}\(0,\\sigma^\{2\}I\)independently\. Givenf∗∈𝕏0f^\{\*\}\\in\\mathbb\{X\}\_\{0\}, we now consider its recovery via posterior sampling𝒫\(⋅\|b,A\)\\mathcal\{P\}\(\\cdot\|b,A\), whereb=A\(f∗\)\+eb=A\(f^\{\*\}\)\+e\. Note that this posterior is identical to that considered in Theorem[3\.3](https://arxiv.org/html/2605.06861#S3.Thmtheorem3), i\.e\.,𝒫\(⋅\|b,A\)=𝒫\(⋅\|y,θ\)\\mathcal\{P\}\(\\cdot\|b,A\)=\\mathcal\{P\}\(\\cdot\|y,\\theta\)withθ\\thetaandyygiven by \([3\.1](https://arxiv.org/html/2605.06861#S3.E1)\)\. This can be shown as follows\. Notice first thatb=W\(θ\)yb=W\(\\theta\)yandA=W\(θ\)L\(θ\)A=W\(\\theta\)L\(\\theta\), whereW\(θ\)=diag\(w\(θ1\)/m,…,w\(θm\)/m\)W\(\\theta\)=\\mathrm\{diag\}\(\\sqrt\{w\(\\theta\_\{1\}\)/m\},\\ldots,\\sqrt\{w\(\\theta\_\{m\}\)/m\}\)\. Recall thatw\>0w\>0a\.s\., thereforeW\(θ\)W\(\\theta\)is measurable and invertible a\.s\.\. Moreover, the map\(y,θ\)↦\(b,A\)\(y,\\theta\)\\mapsto\(b,A\)is injective, sinceθ∈Θ↦L\(θ\)∈ℬ\(𝕏0\)\\theta\\in\\Theta\\mapsto L\(\\theta\)\\in\\mathcal\{B\}\(\\mathbb\{X\}\_\{0\}\)is injective by assumption\. The result claim now follows\. Hence, for the remainder of the proof, we may consider𝒫\(⋅\|b,A\)\\mathcal\{P\}\(\\cdot\|b,A\)\.

A critical observation is that the sensing operatorAAand noiseeeare now dependent random variables, due to the weightingw\(θi\)w\(\\theta\_\{i\}\)\. Note that this weighting is essential, as it ensures that

𝔼‖A\(f\)‖2=∑i=1m𝔼w\(θi\)m\|L\(θi\)\(f\)\|2=∫Θw\(θ\)\|L\(θ\)\(f\)\|2dμ\(θ\)=∫Θ\|L\(θ\)\(f\)\|2dρ\(θ\)\.\\mathbb\{E\}\{\\\|A\(f\)\\\|\}^\{2\}=\\sum^\{m\}\_\{i=1\}\\mathbb\{E\}\\frac\{w\(\\theta\_\{i\}\)\}\{m\}\|L\(\\theta\_\{i\}\)\(f\)\|^\{2\}=\\int\_\{\\Theta\}w\(\\theta\)\|L\(\\theta\)\(f\)\|^\{2\}\\,\\mathrm\{d\}\\mu\(\\theta\)=\\int\_\{\\Theta\}\|L\(\\theta\)\(f\)\|^\{2\}\\,\\mathrm\{d\}\\rho\(\\theta\)\.Therefore, by \([3\.2](https://arxiv.org/html/2605.06861#S3.E2)\),𝔼‖A\(f\)‖2\\mathbb\{E\}\{\\\|A\(f\)\\\|\}^\{2\}is equivalent, up toα\\alphaandβ\\beta, to‖f‖2\{\\\|f\\\|\}^\{2\}\. This crucial observation allows us to derive concentration boundsC𝗅𝗈𝗐C\_\{\\mathsf\{low\}\}andC𝗎𝗉𝗉C\_\{\\mathsf\{upp\}\}that are exponentially\-small inmm\(Lemma[A\.7](https://arxiv.org/html/2605.06861#A1.Thmtheorem7)\), which is a key step in proving Theorem[3\.3](https://arxiv.org/html/2605.06861#S3.Thmtheorem3)\.

The dependence betweenAAandeemeans, in particular, that the framework ofAdcock and Huang \[[2025](https://arxiv.org/html/2605.06861#bib.bib59)\]cannot be used, as it assumes independence between these two terms\. Fortunately, Theorem[A\.5](https://arxiv.org/html/2605.06861#A1.Thmtheorem5)extendsAdcock and Huang \[[2025](https://arxiv.org/html/2605.06861#bib.bib59)\]by allowing dependence and considering only their joint distributionℳ\\mathcal\{M\}\. This key generalization facilitates OSP and the proof of Theorem[3\.3](https://arxiv.org/html/2605.06861#S3.Thmtheorem3)\.

Having defined the distributionℳ\\mathcal\{M\}, the next step is to estimate the various constants appearing in Theorem[3\.3](https://arxiv.org/html/2605.06861#S3.Thmtheorem3)\.

###### Lemma A\.7\(Relative concentration bounds for𝒜\\mathcal\{A\}\)\.

LetD=𝕊\(𝒫\)D=\\mathbb\{S\}\(\\mathcal\{P\}\)and suppose that \([3\.2](https://arxiv.org/html/2605.06861#S3.E2)\) holds with constants0<α≤β<∞0<\\alpha\\leq\\beta<\\infty\. Then the relative lower and upper concentration bounds for𝒜\\mathcal\{A\}can be taken as

C𝗎𝗉𝗉\(t;𝒜,D\)\\displaystyle C\_\{\\mathsf\{upp\}\}\(t;\\mathcal\{A\},D\)=2exp⁡\(−mc𝗎𝗉𝗉\(t,β\)κw\(𝒫,L\)\),∀t\>β,\\displaystyle=2\\exp\\left\(\-\\frac\{mc\_\{\\mathsf\{upp\}\}\(t,\\beta\)\}\{\\kappa\_\{w\}\(\\mathcal\{P\},L\)\}\\right\),\\quad\\forall t\>\\sqrt\{\\beta\},C𝗅𝗈𝗐\(1/t;𝒜,D\)\\displaystyle C\_\{\\mathsf\{low\}\}\(1/t;\\mathcal\{A\},D\)=2exp⁡\(−mc𝗅𝗈𝗐\(t,α\)κw\(𝒫,L\)\),∀t\>1/α,\\displaystyle=2\\exp\\left\(\-\\frac\{mc\_\{\\mathsf\{low\}\}\(t,\\alpha\)\}\{\\kappa\_\{w\}\(\\mathcal\{P\},L\)\}\\right\),\\quad\\forall t\>1/\\sqrt\{\\alpha\},wherec𝗎𝗉𝗉\(t,β\)c\_\{\\mathsf\{upp\}\}\(t,\\beta\)depends onttandβ\\betaonly,c𝗅𝗈𝗐\(t,α\)c\_\{\\mathsf\{low\}\}\(t,\\alpha\)depends onttandα\\alphaonly andκw\(𝒫,L\)\\kappa\_\{w\}\(\\mathcal\{P\},L\)is as in \([3\.5](https://arxiv.org/html/2605.06861#S3.E5)\)\.

###### Proof\.

Letf∈Df\\in Dbe arbitrary and write

‖A\(f\)‖2=1m∑i=1mw\(θi\)\|L\(θi\)\(f\)\|2=∑i=1mXi,\{\\\|A\(f\)\\\|\}^\{2\}=\\frac\{1\}\{m\}\\sum^\{m\}\_\{i=1\}w\(\\theta\_\{i\}\)\|L\(\\theta\_\{i\}\)\(f\)\|^\{2\}=\\sum^\{m\}\_\{i=1\}X\_\{i\},where theXiX\_\{i\}are i\.i\.d\. random variables withXi≥0X\_\{i\}\\geq 0\. Notice that

∑i=1m𝔼\(Xi\)=∫Θw\(θ\)\|L\(θ\)\(f\)\|2dμ\(θ\)=∫Θ\|L\(θ\)\(f\)\|2dρ\(θ\)\\sum^\{m\}\_\{i=1\}\\mathbb\{E\}\(X\_\{i\}\)=\\int\_\{\\Theta\}w\(\\theta\)\|L\(\\theta\)\(f\)\|^\{2\}\\,\\mathrm\{d\}\\mu\(\\theta\)=\\int\_\{\\Theta\}\|L\(\\theta\)\(f\)\|^\{2\}\\,\\mathrm\{d\}\\rho\(\\theta\)and therefore

α‖f‖2≤∑i=1m𝔼\(Xi\)≤β‖f‖2\\alpha\{\\\|f\\\|\}^\{2\}\\leq\\sum^\{m\}\_\{i=1\}\\mathbb\{E\}\(X\_\{i\}\)\\leq\\beta\{\\\|f\\\|\}^\{2\}by the nondegeneracy assumption \([3\.2](https://arxiv.org/html/2605.06861#S3.E2)\)\. Hence

ℙ\(‖A\(f\)‖≥t‖f‖\)≤ℙ\(∑i=1mXi≥t2β∑i=1m𝔼\(Xi\)\)\\mathbb\{P\}\(\{\\\|A\(f\)\\\|\}\\geq t\{\\\|f\\\|\}\)\\leq\\mathbb\{P\}\\left\(\\sum^\{m\}\_\{i=1\}X\_\{i\}\\geq\\frac\{t^\{2\}\}\{\\beta\}\\sum^\{m\}\_\{i=1\}\\mathbb\{E\}\(X\_\{i\}\)\\right\)and

ℙ\(‖A\(f\)‖≤t−1‖f‖\)≤ℙ\(∑i=1mXi≤1t2α∑i=1m𝔼\(Xi\)\)\.\\mathbb\{P\}\(\{\\\|A\(f\)\\\|\}\\leq t^\{\-1\}\{\\\|f\\\|\}\)\\leq\\mathbb\{P\}\\left\(\\sum^\{m\}\_\{i=1\}X\_\{i\}\\leq\\frac\{1\}\{t^\{2\}\\alpha\}\\sum^\{m\}\_\{i=1\}\\mathbb\{E\}\(X\_\{i\}\)\\right\)\.Now observe that

Xi≤κw\(𝒫,L\)/mX\_\{i\}\\leq\\kappa\_\{w\}\(\\mathcal\{P\},L\)/mWe now recall the standard Chernoff bound, which gives

ℙ\(∑i=1mXi≥\(1\+δ\)∑i=1m𝔼\(Xi\)\)≤exp⁡\(−mc𝗎𝗉𝗉\(δ\)κw\(𝒫,L\)\),∀δ\>0,\\mathbb\{P\}\\left\(\\sum^\{m\}\_\{i=1\}X\_\{i\}\\geq\(1\+\\delta\)\\sum^\{m\}\_\{i=1\}\\mathbb\{E\}\(X\_\{i\}\)\\right\)\\leq\\exp\\left\(\-\\frac\{mc\_\{\\mathsf\{upp\}\}\(\\delta\)\}\{\\kappa\_\{w\}\(\\mathcal\{P\},L\)\}\\right\),\\quad\\forall\\delta\>0,and

ℙ\(∑i=1mXi≤\(1−δ\)∑i=1m𝔼\(Xi\)\)≤exp⁡\(−mc𝗅𝗈𝗐\(δ\)κw\(𝒫,L\)\),∀0<δ<1\.\\mathbb\{P\}\\left\(\\sum^\{m\}\_\{i=1\}X\_\{i\}\\leq\(1\-\\delta\)\\sum^\{m\}\_\{i=1\}\\mathbb\{E\}\(X\_\{i\}\)\\right\)\\leq\\exp\\left\(\-\\frac\{mc\_\{\\mathsf\{low\}\}\(\\delta\)\}\{\\kappa\_\{w\}\(\\mathcal\{P\},L\)\}\\right\),\\quad\\forall 0<\\delta<1\.Settingδ=t2/β−1\\delta=t^\{2\}/\\beta\-1andδ=1−1/\(αt2\)\\delta=1\-1/\(\\alpha t^\{2\}\)now gives the result\. ∎

###### Lemma A\.8\(Absolute concentration bound for𝒜\\mathcal\{A\}\)\.

LetD=𝕏0D=\\mathbb\{X\}\_\{0\}and suppose that \([3\.2](https://arxiv.org/html/2605.06861#S3.E2)\) holds with constants0<α≤β<∞0<\\alpha\\leq\\beta<\\infty\. Then the absolute concentration bound for𝒜\\mathcal\{A\}can be taken as

C𝖺𝖻𝗌\(s,t;𝒜,D\)=s2t2\.C\_\{\\mathsf\{abs\}\}\(s,t;\\mathcal\{A\},D\)=\\frac\{s^\{2\}\}\{t^\{2\}\}\.

###### Proof\.

By Markov’s inequality, the definition ofAAand \([3\.2](https://arxiv.org/html/2605.06861#S3.E2)\), we have

ℙA∼𝒜\(‖A\(f\)‖\>t\)≤𝔼‖A\(f\)‖2t2≤β‖f‖2t2\.\\mathbb\{P\}\_\{A\\sim\\mathcal\{A\}\}\(\{\\\|A\(f\)\\\|\}\>t\)\\leq\\frac\{\\mathbb\{E\}\{\\\|A\(f\)\\\|\}^\{2\}\}\{t^\{2\}\}\\leq\\frac\{\\beta\{\\\|f\\\|\}^\{2\}\}\{t^\{2\}\}\.Therefore, if‖f‖≤s\{\\\|f\\\|\}\\leq s, we obtainℙA∼𝒜\(‖A\(f\)‖\>t\)≤s2/t2\\mathbb\{P\}\_\{A\\sim\\mathcal\{A\}\}\(\{\\\|A\(f\)\\\|\}\>t\)\\leq s^\{2\}/t^\{2\}, as required\. ∎

###### Lemma A\.9\(Concentration bound forℰ\\mathcal\{E\}\)\.

The upper concentration boundD𝗎𝗉𝗉\(t;ℰ\)D\_\{\\mathsf\{upp\}\}\(t;\\mathcal\{E\}\)can be taken as

D𝗎𝗉𝗉\(t;ℰ\)=\(t2wmaxσ2e1−t2wmaxσ2\)m2,∀t\>σwmax,D\_\{\\mathsf\{upp\}\}\(t;\\mathcal\{E\}\)=\\left\(\\frac\{t^\{2\}\}\{w\_\{\\max\}\\sigma^\{2\}\}\\mathrm\{e\}^\{1\-\\frac\{t^\{2\}\}\{w\_\{\\max\}\\sigma^\{2\}\}\}\\right\)^\{\\frac\{m\}\{2\}\},\\quad\\forall t\>\\sigma\\sqrt\{w\_\{\\max\}\},wherewmax=esssupθ∼ρ⁡w\(θ\)w\_\{\\max\}=\\operatorname\*\{ess\\,sup\}\_\{\\theta\\sim\\rho\}w\(\\theta\)\.

###### Proof\.

Recall thatℰ\\mathcal\{E\}is the marginal distribution, i\.e\., the distribution ofeewhen\(A,e\)∼ℳ\(A,e\)\\sim\\mathcal\{M\}\. Observe that

ℰ\(Btc\)=𝔼A∼𝒜\[ℙe∼ℰA\(‖e‖\>t\)\]\.\\mathcal\{E\}\(B^\{c\}\_\{t\}\)=\\mathbb\{E\}\_\{A\\sim\\mathcal\{A\}\}\[\\mathbb\{P\}\_\{e\\sim\\mathcal\{E\}\_\{A\}\}\(\{\\\|e\\\|\}\>t\)\]\.Now recall thatℰA=𝒩\(0,σ2mdiag\(w\(θ1\),…,w\(θm\)\)\)\\mathcal\{E\}\_\{A\}=\\mathcal\{N\}\(0,\\frac\{\\sigma^\{2\}\}\{m\}\\mathrm\{diag\}\(w\(\\theta\_\{1\}\),\\ldots,w\(\\theta\_\{m\}\)\)\)\. Hence

ℙe∼ℰA\(‖e‖\>t\)\\displaystyle\\mathbb\{P\}\_\{e\\sim\\mathcal\{E\}\_\{A\}\}\(\{\\\|e\\\|\}\>t\)=ℙn∼𝒩\(0,σ2mI\)\(∑i=1mw\(θi\)ni2\>t2\)≤ℙn∼𝒩\(0,σ2mI\)\(‖n‖2\>t2wmax\)\.\\displaystyle=\\mathbb\{P\}\_\{n\\sim\\mathcal\{N\}\(0,\\frac\{\\sigma^\{2\}\}\{m\}I\)\}\\left\(\\sum^\{m\}\_\{i=1\}w\(\\theta\_\{i\}\)n^\{2\}\_\{i\}\>t^\{2\}\\right\)\\leq\\mathbb\{P\}\_\{n\\sim\\mathcal\{N\}\(0,\\frac\{\\sigma^\{2\}\}\{m\}I\)\}\\left\(\{\\\|n\\\|\}^\{2\}\>\\frac\{t^\{2\}\}\{w\_\{\\max\}\}\\right\)\.The result now follows from\[Adcock and Huang,[2025](https://arxiv.org/html/2605.06861#bib.bib59), Lem\. B\.1\]\. ∎

###### Lemma A\.10\(Density shift bound forℳ\\mathcal\{M\}\)\.

The density shift boundD𝗌𝗁𝗂𝖿𝗍\(ε,τ;ℳ\)D\_\{\\mathsf\{shift\}\}\(\\varepsilon,\\tau;\\mathcal\{M\}\)can be taken as

D𝗌𝗁𝗂𝖿𝗍\(ε,τ;ℳ\)=exp⁡\(m\(2τ\+ε\)2σ2wminε\),∀ε,τ≥0,D\_\{\\mathsf\{shift\}\}\(\\varepsilon,\\tau;\\mathcal\{M\}\)=\\exp\\left\(\\frac\{m\(2\\tau\+\\varepsilon\)\}\{2\\sigma^\{2\}w\_\{\\min\}\}\\varepsilon\\right\),\\quad\\forall\\varepsilon,\\tau\\geq 0,wherewmin=essinfθ∼ρw\(θ\)w\_\{\\min\}=\\mathrm\{essinf\}\_\{\\theta\\sim\\rho\}w\(\\theta\)\.

###### Proof\.

Givenθ\\theta, letAAbe as defined above ande,v∈ℝme,v\\in\\mathbb\{R\}^\{m\}with‖e‖≤τ\{\\\|e\\\|\}\\leq\\tauand‖v‖≤ε\{\\\|v\\\|\}\\leq\\varepsilon\. Recall from the previous proof thatℰA=𝒩\(0,σ2mdiag\(w\(θ1\),…,w\(θm\)\)\)\\mathcal\{E\}\_\{A\}=\\mathcal\{N\}\(0,\\frac\{\\sigma^\{2\}\}\{m\}\\mathrm\{diag\}\(w\(\\theta\_\{1\}\),\\ldots,w\(\\theta\_\{m\}\)\)\)and thereforeTv♯ℰA=𝒩\(v,σ2mdiag\(w\(θ1\),…,w\(θm\)\)\)T\_\{v\}\\sharp\\mathcal\{E\}\_\{A\}=\\mathcal\{N\}\(v,\\frac\{\\sigma^\{2\}\}\{m\}\\mathrm\{diag\}\(w\(\\theta\_\{1\}\),\\ldots,w\(\\theta\_\{m\}\)\)\)\. Both distributions are absolutely continuous with respect to Lebesgue measure\. Hence the Radon\-Nikodym derivativedℰA/dTv♯ℰA\\,\\mathrm\{d\}\\mathcal\{E\}\_\{A\}/\\,\\mathrm\{d\}T\_\{v\}\\sharp\\mathcal\{E\}\_\{A\}is just the ratio of their densities, i\.e\.,

dℰAdTv♯ℰA\(e\)\\displaystyle\\frac\{\\,\\mathrm\{d\}\\mathcal\{E\}\_\{A\}\}\{\\,\\mathrm\{d\}T\_\{v\}\\sharp\\mathcal\{E\}\_\{A\}\}\(e\)=exp⁡\[m2σ2∑i=1m\(\(ei−vi\)2w\(θi\)−ei2w\(θi\)\)\]=exp⁡\[m2σ2∑i=1mvi\(vi−2ei\)w\(θi\)\]\.\\displaystyle=\\exp\\left\[\\frac\{m\}\{2\\sigma^\{2\}\}\\sum^\{m\}\_\{i=1\}\\left\(\\frac\{\(e\_\{i\}\-v\_\{i\}\)^\{2\}\}\{w\(\\theta\_\{i\}\)\}\-\\frac\{e^\{2\}\_\{i\}\}\{w\(\\theta\_\{i\}\)\}\\right\)\\right\]=\\exp\\left\[\\frac\{m\}\{2\\sigma^\{2\}\}\\sum^\{m\}\_\{i=1\}\\frac\{v\_\{i\}\(v\_\{i\}\-2e\_\{i\}\)\}\{w\(\\theta\_\{i\}\)\}\\right\]\.By the Cauchy–Schwarz inequality and the assumptions onvvandee, we have

∑i=1mvi\(vi−2ei\)w\(θi\)≤‖v‖‖v−2e‖wmin≤\(ε\+2τ\)εwmin\\sum^\{m\}\_\{i=1\}\\frac\{v\_\{i\}\(v\_\{i\}\-2e\_\{i\}\)\}\{w\(\\theta\_\{i\}\)\}\\leq\\frac\{\{\\\|v\\\|\}\{\\\|v\-2e\\\|\}\}\{w\_\{\\min\}\}\\leq\\frac\{\(\\varepsilon\+2\\tau\)\\varepsilon\}\{w\_\{\\min\}\}The result now follows immediately\. ∎

###### Proof of Theorem[3\.3](https://arxiv.org/html/2605.06861#S3.Thmtheorem3)\.

Letp=ℙ\[‖f∗−f^‖≥\(8d2\+2\)\(η\+σ\)\]p=\\mathbb\{P\}\[\{\\\|f^\{\*\}\-\\hat\{f\}\\\|\}\\geq\(8d^\{2\}\+2\)\(\\eta\+\\sigma\)\]for somed≥1d\\geq 1to be chosen later\. Now apply Theorem[A\.5](https://arxiv.org/html/2605.06861#A1.Thmtheorem5)withc=8d2c=8d^\{2\}to get

p≲\\displaystyle p\\lesssimδ\+C𝖺𝖻𝗌\(ε~,tε~;𝒜,𝕏0\)\+D𝗎𝗉𝗉\(c′σ;ℰ\)\\displaystyle~\\delta\+C\_\{\\mathsf\{abs\}\}\(\\tilde\{\\varepsilon\},t\\tilde\{\\varepsilon\};\\mathcal\{A\},\\mathbb\{X\}\_\{0\}\)\+D\_\{\\mathsf\{upp\}\}\(c^\{\\prime\}\\sigma;\\mathcal\{E\}\)\+D𝗌𝗁𝗂𝖿𝗍\(tε~,c′σ;ℳ\)exp⁡\(k\)\[C𝗅𝗈𝗐\(1d;𝒜,D\)\+C𝗎𝗉𝗉\(d;𝒜,D\)\+D𝗎𝗉𝗉\(dσ;ℰ\)\]\\displaystyle\+D\_\{\\mathsf\{shift\}\}\(t\\tilde\{\\varepsilon\},c^\{\\prime\}\\sigma;\\mathcal\{M\}\)\\exp\(k\)\\left\[C\_\{\\mathsf\{low\}\}\\left\(\\frac\{1\}\{d\};\\mathcal\{A\},D\\right\)\+C\_\{\\mathsf\{upp\}\}\\left\(d;\\mathcal\{A\},D\\right\)\+D\_\{\\mathsf\{upp\}\}\\left\(d\\sigma;\\mathcal\{E\}\\right\)\\right\]for anyc′≥1c^\{\\prime\}\\geq 1andt\>0t\>0, whereε~=ε/δ1/p\\tilde\{\\varepsilon\}=\\varepsilon/\\delta^\{1/p\},D=supp\(𝒫\)−supp\(𝒫\)D=\\mathrm\{supp\}\(\\mathcal\{P\}\)\-\\mathrm\{supp\}\(\\mathcal\{P\}\)andk∈ℕk\\in\\mathbb\{N\}is any integer such that

log⁡Covη,δ\(𝒫\)≤k\.\\log\\mathrm\{Cov\}\_\{\\eta,\\delta\}\(\\mathcal\{P\}\)\\leq k\.Set

c′=2wmax,d=2max⁡\{wmax,1/α,β\}\.c^\{\\prime\}=2\\sqrt\{w\_\{\\max\}\},\\quad d=2\\max\\\{\\sqrt\{w\_\{\\max\}\},1/\\sqrt\{\\alpha\},\\sqrt\{\\beta\}\\\}\.Then Lemma[A\.9](https://arxiv.org/html/2605.06861#A1.Thmtheorem9)gives

D𝗎𝗉𝗉\(dσ;ℰ\)≤D𝗎𝗉𝗉\(c′σ;ℰ\)≤\(2e−1\)m2≤exp⁡\(−m/16\)\.D\_\{\\mathsf\{upp\}\}\\left\(d\\sigma;\\mathcal\{E\}\\right\)\\leq D\_\{\\mathsf\{upp\}\}\\left\(c^\{\\prime\}\\sigma;\\mathcal\{E\}\\right\)\\leq\\left\(2\\mathrm\{e\}^\{\-1\}\\right\)^\{\\frac\{m\}\{2\}\}\\leq\\exp\(\-m/16\)\.Now sett=1/δt=1/\\sqrt\{\\delta\}\. Then Lemma[A\.8](https://arxiv.org/html/2605.06861#A1.Thmtheorem8)gives

C𝖺𝖻𝗌\(ε~,tε~;𝒜,𝕏0\)=ε~2t2ε~2=δ\.C\_\{\\mathsf\{abs\}\}\(\\tilde\{\\varepsilon\},t\\tilde\{\\varepsilon\};\\mathcal\{A\},\\mathbb\{X\}\_\{0\}\)=\\frac\{\\tilde\{\\varepsilon\}^\{2\}\}\{t^\{2\}\\tilde\{\\varepsilon\}^\{2\}\}=\\delta\.Next, Lemma[A\.7](https://arxiv.org/html/2605.06861#A1.Thmtheorem7)and the choice ofddgives

C𝗅𝗈𝗐\(1d;𝒜,D\)\+C𝗎𝗉𝗉\(d;𝒜,D\)≤2exp⁡\(−mc\(α,β\)κw\(𝒫\)\),C\_\{\\mathsf\{low\}\}\\left\(\\frac\{1\}\{d\};\\mathcal\{A\},D\\right\)\+C\_\{\\mathsf\{upp\}\}\\left\(d;\\mathcal\{A\},D\\right\)\\leq 2\\exp\\left\(\-\\frac\{mc\(\\alpha,\\beta\)\}\{\\kappa\_\{w\}\(\\mathcal\{P\}\)\}\\right\),for some constantc\(α,β\)c\(\\alpha,\\beta\)depending onα,β\\alpha,\\betaonly\. Substituting these three bounds into the previous expression, we deduce that

p≲\\displaystyle p\\lesssimδ\+exp⁡\(−m16\)\+D𝗌𝗁𝗂𝖿𝗍\(tε~,c′σ;ℳ\)exp⁡\(k\)\[exp⁡\(−mc\(α,β\)κw\(𝒫\)\)\+exp⁡\(−m16\)\]\.\\displaystyle~\\delta\+\\exp\\left\(\-\\frac\{m\}\{16\}\\right\)\+D\_\{\\mathsf\{shift\}\}\(t\\tilde\{\\varepsilon\},c^\{\\prime\}\\sigma;\\mathcal\{M\}\)\\exp\(k\)\\left\[\\exp\\left\(\-\\frac\{mc\(\\alpha,\\beta\)\}\{\\kappa\_\{w\}\(\\mathcal\{P\}\)\}\\right\)\+\\exp\\left\(\-\\frac\{m\}\{16\}\\right\)\\right\]\.Finally, we estimate the density shift bound\. Using Lemma[A\.10](https://arxiv.org/html/2605.06861#A1.Thmtheorem10), we get

D𝗌𝗁𝗂𝖿𝗍\(tε~,c′σ;ℳ\)≤exp⁡\(m\(2c′σ\+tε~\)2σ2wmintε~\)=exp⁡\(2mwmaxε~σwminδ\+mε~22σ2wminδ\)\.\\displaystyle D\_\{\\mathsf\{shift\}\}\(t\\tilde\{\\varepsilon\},c^\{\\prime\}\\sigma;\\mathcal\{M\}\)\\leq\\exp\\left\(\\frac\{m\(2c^\{\\prime\}\\sigma\+t\\tilde\{\\varepsilon\}\)\}\{2\\sigma^\{2\}w\_\{\\min\}\}t\\tilde\{\\varepsilon\}\\right\)=\\exp\\left\(\\frac\{2m\\sqrt\{w\_\{\\max\}\}\\tilde\{\\varepsilon\}\}\{\\sigma w\_\{\\min\}\\sqrt\{\\delta\}\}\+\\frac\{m\\tilde\{\\varepsilon\}^\{2\}\}\{2\\sigma^\{2\}w\_\{\\min\}\\delta\}\\right\)\.Now suppose that

ε~≤wminδmσ\.\\tilde\{\\varepsilon\}\\leq\\frac\{w\_\{\\min\}\\sqrt\{\\delta\}\}\{m\}\\sigma\.\(A\.7\)Notice that this choice satisfiesε~≤σ\\tilde\{\\varepsilon\}\\leq\\sigma, sinceδ≤1\\delta\\leq 1by assumption andwmin≤1w\_\{\\min\}\\leq 1\. The latter observation follows from the fact that1=∫Ddμ\(θ\)=∫w\(θ\)−1dρ\(θ\)1=\\int\_\{D\}\\,\\mathrm\{d\}\\mu\(\\theta\)=\\int w\(\\theta\)^\{\-1\}\\,\\mathrm\{d\}\\rho\(\\theta\)\. This yields

D𝗌𝗁𝗂𝖿𝗍\(tε~,c′σ;ℳ\)≤exp⁡\(2wmax\+1\)≤exp⁡\(3wmax\),D\_\{\\mathsf\{shift\}\}\(t\\tilde\{\\varepsilon\},c^\{\\prime\}\\sigma;\\mathcal\{M\}\)\\leq\\exp\(2\\sqrt\{w\_\{\\max\}\}\+1\)\\leq\\exp\(3\\sqrt\{w\_\{\\max\}\}\),where in the final step we use the observation thatwmax≥1w\_\{\\max\}\\geq 1, which follows from the same fact as was used directly above\. To conclude, we have shown that

ℙ\[‖f∗−f^‖≥\(8d2\+2\)\(η\+σ\)\]≲δ\+exp⁡\(3wmax\+k−mmin⁡\{c\(α,β\)κw\(𝒫\),116\}\),\\mathbb\{P\}\[\{\\\|f^\{\*\}\-\\hat\{f\}\\\|\}\\geq\(8d^\{2\}\+2\)\(\\eta\+\\sigma\)\]\\lesssim\\delta\+\\exp\\left\(3\\sqrt\{w\_\{\\max\}\}\+k\-m\\min\\left\\\{\\frac\{c\(\\alpha,\\beta\)\}\{\\kappa\_\{w\}\(\\mathcal\{P\}\)\},\\frac\{1\}\{16\}\\right\\\}\\right\),provided that \([A\.7](https://arxiv.org/html/2605.06861#A1.E7)\) holds, whered=2max⁡\{wmax,1/α,β\}d=2\\max\\\{\\sqrt\{w\_\{\\max\}\},1/\\sqrt\{\\alpha\},\\sqrt\{\\beta\}\\\}\. Sinceε~=ε/δ1/p\\tilde\{\\varepsilon\}=\\varepsilon/\\delta^\{1/p\}, we see that \([A\.7](https://arxiv.org/html/2605.06861#A1.E7)\) is equivalent to

σ≥mεwminδ1/p\+1/2,\\sigma\\geq\\frac\{m\\varepsilon\}\{w\_\{\\min\}\\delta^\{1/p\+1/2\}\},which holds by assumption\. To complete the proof, we use \([3\.3](https://arxiv.org/html/2605.06861#S3.E3)\)\. ∎

## Appendix BExperimental details

In this appendix, we provide further details for the three main experiments shown in the paper\. Here, we disclose that LLMs were used to assist in the coding of these experiments\. All experiments and code however, has been checked and verified to the best of our ability to be accurate and not misleading\.

### B\.1Pinball problem

As a primary example, we consider the pinball dataset found inTomasettoet al\.\[[2025](https://arxiv.org/html/2605.06861#bib.bib72)\]\. The setup and details for this experiment are very similar toRowbottomet al\.\[[2026](https://arxiv.org/html/2605.06861#bib.bib41)\]\. We restate them here for clarity\. Mathematically, this dataset consists of samples from an advection diffusion problem with an implicit parametric dependence on the square domain\[−1,1\]×\[−1,1\]\[\-1,1\]\\times\[\-1,1\]with three cylinders of radiusr=0\.15r=0\.15at points\(−0\.5,−0\.5\),\(0\.5,−0\.5\)\(\-0\.5,\-0\.5\),\(0\.5,\-0\.5\)and\(0\.0,0\.5\)\(0\.0,0\.5\)\. We denote this square domain with the cylinders removed asΘ\\Theta\. Each cylinder rotates at a constant velocityviv\_\{i\}, which we parameterize byμ=\[v1,v2,v2\]\\mu=\[v\_\{1\},v\_\{2\},v\_\{2\}\]\. The rotation of the cylinders with velocityμ\\muinduces a velocityv:Θ→ℝv:\\Theta\\to\\mathbb\{R\}and a pressurep:Θ→ℝp:\\Theta\\to\\mathbb\{R\}at each point in the domain\. Both the velocity and pressure are determined by the steady\-state Naiver–Stokes equation\.Tomasettoet al\.\[[2025](https://arxiv.org/html/2605.06861#bib.bib72)\]usesν=1\\nu=1and no\-slip boundary conditions on the external walls and Dirichlet boundary conditions on the cylinders\. They then consider a quantity \(e\.g\., mass or particle density\)ρ:Θ×\[0,T\]→ℝ\\rho:\\Theta\\times\[0,T\]\\to\\mathbb\{R\}that spreads inside the domain according to

ρt\+∇⋅\(−η∇ρ\+v\(𝝁\)y\)=0,\\displaystyle\\rho\_\{t\}\+\\nabla\\cdot\(\-\\eta\\nabla\\rho\+v\(\\bm\{\\mu\}\)y\)=0,\(B\.1\)with homogeneous Neumann boundary conditions,η=0\.001\\eta=0\.001,T=3T=3andv\(𝝁\)v\(\\bm\{\\mu\}\)as the solution of the Navier\-Stokes equation given the velocity of the cylinders𝝁\\bm\{\\mu\}\. For a fixed initial condition

ρ\(x,0\)=10πexp⁡\(−10x12−10x22\)\\displaystyle\\rho\(x,0\)=\\frac\{10\}\{\\pi\}\\exp\(\-10x\_\{1\}^\{2\}\-10x\_\{2\}^\{2\}\)\(B\.2\)the distributionρ\\rhodepends directly on the fluid velocityv\(𝝁\)v\(\\bm\{\\mu\}\)\. We collect roll\-outs ofρt\\rho\_\{t\}for 400/50/50 varying𝝁\\bm\{\\mu\}as our training/validation/test sets and aim to learn this distribution \(independent oftt\)\.

Example reconstructions atm=10m\{=\}10are shown in Fig\.[4](https://arxiv.org/html/2605.06861#A2.F4)\.

![Refer to caption](https://arxiv.org/html/2605.06861v1/x4.png)Figure 4:Pinball reconstructions withm=10m\{=\}10:sensor locations on the unstructured75257525\-node mesh; Fig\.[1](https://arxiv.org/html/2605.06861#S1.F1)\. Sensor positions overlaid; reconstruction relative L2 error annotated\.
### B\.2Darcy flow problem

We use the Darcy benchmark ofHuanget al\.\[[2024](https://arxiv.org/html/2605.06861#bib.bib22)\]: the steady\-state PDE

−∇⋅\(a∇u\)=fon\[0,1\]2,\\displaystyle\-\\nabla\\cdot\(a\\nabla u\)=f\\qquad\\text\{on \}\[0,1\]^\{2\},\(B\.3\)with fixed sourceff, no\-flux Dirichlet boundary conditions, and a binary permeability fielda∈\{3,12\}a\\in\\\{3,12\\\}piecewise\-constant on a Voronoi tessellation\. The pressureuuis the diffusion\-smoothed solution\. We use the official128×128128\\times 128test set \(10,000 fields\) and the pretrained EDM denoiser fromHuanget al\.\[[2024](https://arxiv.org/html/2605.06861#bib.bib22)\]; reconstruction follows their DPS posterior\-sampling variant with500500Heun steps and the publishedζ\\zetaschedule\. Per\-strategy reconstructions appear in Fig\.[5](https://arxiv.org/html/2605.06861#A2.F5)\.

![Refer to caption](https://arxiv.org/html/2605.06861v1/x5.png)Figure 5:Darcy reconstructions atm=25m\{=\}25:Two test samples; rows alternate the permeabilityaaand pressureuuchannels\. Per\-cell error annotated\.
### B\.3Kolmogorov flow problem

We use the 2D Kolmogorov turbulence benchmark ofAmorós\-Trepatet al\.\[[2026](https://arxiv.org/html/2605.06861#bib.bib5)\]: incompressible Navier–Stokes atRe=1000\\mathrm\{Re\}\{=\}1000with sinusoidal forcing𝐟=\(sin⁡\(8y\),0\)\\mathbf\{f\}=\(\\sin\(8y\),0\)on a periodic256×256256\\times 256domain, simulated to a stationary regime\. Each sample is a three\-step temporal stack of vorticity\. We use their pretrained DDIM checkpoint with the masked\-blending sparse\-reconstruction sampler \(200200reverse\-diffusion steps\)\. Per\-strategy reconstructions appear in Fig\.[7](https://arxiv.org/html/2605.06861#A2.F7)\.

![Refer to caption](https://arxiv.org/html/2605.06861v1/x6.png)Figure 6:Kolmogorov flow: m\-convergence\(a\)Mean relative\-L2L\_\{2\}error for the Kolmogorov turbulence \(Re=1000\\mathrm\{Re\}=1000,2562256^\{2\}\) reconstruction task using masked\-diffusion guidance\.![Refer to caption](https://arxiv.org/html/2605.06861v1/x7.png)Figure 7:Kolmogorov\-flow reconstructions atm=25m\{=\}25:Mid\-trajectory vorticity slice on the256×256256\{\\times\}256grid\. Two test samples; columns mirror Fig\.[5](https://arxiv.org/html/2605.06861#A2.F5)\.

## Appendix CChristoffel\-DPS algorithms

This appendix expands the offline and online algorithms summarized in §[4](https://arxiv.org/html/2605.06861#S4)into explicit pseudocode\. All algorithms are stated in the finite grid setting of §[2](https://arxiv.org/html/2605.06861#S2)\. The unknown signalf∗f^\{\*\}is identified withx∗∈ℝNx^\{\*\}\\in\\mathbb\{R\}^\{N\}, sensors with rows ofINI\_\{N\}collected into the row\-selector matrixSSand measurements withyS=Sx∗\+ηy\_\{S\}=Sx^\{\*\}\+\\eta\.

### C\.1Offline Christoffel\-DPS

The greedy variant requires further discussion, specifically, the implementation via QR with column\-pivoting\. Algorithm[1](https://arxiv.org/html/2605.06861#alg1)greedily computes the rows ofSSby greedy Gram–Schmidt deflation on the matrixXXfrom §[4\.2](https://arxiv.org/html/2605.06861#S4.SS2)\. At each iteration, the location with the largest residual norm is selected and the value corresponding every remaining locatino is deflated along the just\-picked direction\. This is equivalent to column\-pivoted QR\[Businger and Golub,[1965](https://arxiv.org/html/2605.06861#bib.bib6)\]onXX, which scipy implements as`scipy\.linalg\.qr\(F\.T, pivoting=True\)`; we state the deflation form for better readability and understanding\.

Algorithm 1Greedy Christoffel\-DPS1:mean\-adjusted snapshot matrix

X∈ℝN×MX\\in\\mathbb\{R\}^\{N\\times M\}, sensor budget

mm
2:sensor index sequence

\(s1,…,sk\)\(s\_\{1\},\\ldots,s\_\{k\}\)
3:

R←XR\\leftarrow X⊳\\trianglerightinitialize residual,RjR\_\{j\}is rowjjofRR

4:for

i=1,…,mi=1,\\ldots,mdo

5:

si←arg⁡maxj∉\{s1,…,si−1\}⁡‖Rj‖2s\_\{i\}\\leftarrow\\arg\\max\\limits\_\{j\\notin\\\{s\_\{1\},\\ldots,s\_\{i\-1\}\\\}\}\{\\\|R\_\{j\}\\\|\}^\{2\}⊳\\trianglerightpixel with largest residual norm

6:

qi←Rsi/‖Rsi‖q\_\{i\}\\leftarrow R\_\{s\_\{i\}\}/\{\\\|R\_\{s\_\{i\}\}\\\|\}
7:for

j=1,…,Nj=1,\\ldots,Ndo

8:

Rj←Rj−⟨Rj,qi⟩qiR\_\{j\}\\leftarrow R\_\{j\}\-\\langle R\_\{j\},q\_\{i\}\\rangle\\,q\_\{i\}⊳\\trianglerightdeflate alongqiq\_\{i\}

9:endfor

10:endfor

11:return

\(s1,…,sk\)\(s\_\{1\},\\ldots,s\_\{k\}\)

Complexity\.Each iteration involvesO\(NM\)O\(NM\)flops for the deflation andO\(N\)O\(N\)for the argmax, giving a total ofO\(mNM\)O\(mNM\)flops\. Householder pivoted QR has the same scaling\.

### C\.2Online ensemble Christoffel\-DPS

Algorithm[2](https://arxiv.org/html/2605.06861#alg2)runs an ensemble ofNeN\_\{e\}DPS chains through a standard reverse\-diffusion schedule\. Themmsensors are split intom0m\_\{0\}*anchor*sensors at fixed positions typically chosen by the offline greedy Christoffel ordering of §[4\.2](https://arxiv.org/html/2605.06861#S4.SS2)andm1=m−m0m\_\{1\}=m\-m\_\{0\}*mobile*sensors that drift through the domain at a sequence of*drift events*\. At a drift event the empirical Christoffel function \([4\.3](https://arxiv.org/html/2605.06861#S4.E3)\) is evaluated on the live ensemble’s Tweedie point estimates and each mobile sensor moves to the highest\-scoring unoccupied node within a fixed\-radiusrdriftr\_\{\\rm drift\}\-neighbourhood of its current position; the per\-event displacement is bounded byrdriftr\_\{\\rm drift\}, so the total mobile\-sensor travel over a reverse pass is at mostD⋅rdriftD\\cdot r\_\{\\rm drift\}\. Chains whose measurement residual is too large are pruned at the same events\.

For convenience, we writeReverseStepfor one step of the underlying diffusion sampler a Heun or Euler predictor\-corrector update of the formztk\+1\(i\)=Pred\(ztk\(i\),x^0\(i\);σ\(tk\),σ\(tk\+1\),C\)−αkg\(i\)z^\{\(i\)\}\_\{t\_\{k\+1\}\}=\\mathrm\{Pred\}\(z^\{\(i\)\}\_\{t\_\{k\}\},\\hat\{x\}^\{\(i\)\}\_\{0\};\\sigma\(t\_\{k\}\),\\sigma\(t\_\{k\+1\}\),C\)\-\\alpha\_\{k\}\\,g^\{\(i\)\}\[Karraset al\.,[2022](https://arxiv.org/html/2605.06861#bib.bib26), Chunget al\.,[2023](https://arxiv.org/html/2605.06861#bib.bib70)\], whereαk\\alpha\_\{k\}is the DPS guidance weight andg\(i\)g^\{\(i\)\}is the measurement gradient defined at line 7 of Algorithm[2](https://arxiv.org/html/2605.06861#alg2)\.

Algorithm 2Online ensemble Christoffel\-DPS1:denoiser

DθD\_\{\\theta\}; covariance

CC; reverse\-time schedule

T=t0\>⋯\>tK=0T=t\_\{0\}\>\\cdots\>t\_\{K\}=0with noise levels

σ\(tk\)\\sigma\(t\_\{k\}\); drift schedule

𝒯=\{td1,…,tdD\}\\mathcal\{T\}=\\\{t\_\{d\_\{1\}\},\\ldots,t\_\{d\_\{D\}\}\\\}; number of anchor sensors

m0m\_\{0\}and drift sensors

m1m\_\{1\}, total number of sensors

mm; initial selector

SS\(first

m0m\_\{0\}rows correspond to the anchor sensor locations\);node coordinates

ξ1,…,ξN∈ℝd\\xi\_\{1\},\\ldots,\\xi\_\{N\}\\in\\mathbb\{R\}^\{d\}; drift radius

rdriftr\_\{\\rm drift\}; ensemble size

NeN\_\{e\}; pruning gap

Δℓ≥0\\Delta\\ell\\geq 0with floor

Nmin≥1N\_\{\\min\}\\geq 1; ground\-truth signal

x∗x^\{\*\}
2:reconstruction

x^∈ℝN\\hat\{x\}\\in\\mathbb\{R\}^\{N\}
3:

yS←Sx∗y\_\{S\}\\leftarrow Sx^\{\*\}⊳\\trianglerightmeasurements at initialmmsensor locations

4:Sample

zT\(i\)∼𝒩\(0,σ2\(T\)C\)z^\{\(i\)\}\_\{T\}\\sim\\mathcal\{N\}\(0,\\sigma^\{2\}\(T\)\\,C\)for

i=1,…,Nei=1,\\ldots,N\_\{e\}
5:

A←\{1,…,Ne\}A\\leftarrow\\\{1,\\ldots,N\_\{e\}\\\}⊳\\trianglerightsurviving chain indices

6:for

k=0,…,K−1k=0,\\ldots,K\-1do

7:for

i∈Ai\\in Ain paralleldo

8:

x^0\(i\)←Dθ\(ztk\(i\),σ\(tk\)\)\\hat\{x\}^\{\(i\)\}\_\{0\}\\leftarrow D\_\{\\theta\}\(z^\{\(i\)\}\_\{t\_\{k\}\},\\sigma\(t\_\{k\}\)\)⊳\\trianglerightTweedie point estimate

9:

g\(i\)←∇ztk\(i\)12ση2‖Sx^0\(i\)−yS‖2g^\{\(i\)\}\\leftarrow\\nabla\_\{z^\{\(i\)\}\_\{t\_\{k\}\}\}\\tfrac\{1\}\{2\\sigma\_\{\\eta\}^\{2\}\}\{\\\|S\\hat\{x\}^\{\(i\)\}\_\{0\}\-y\_\{S\}\\\|\}^\{2\}⊳\\trianglerightmeasurement gradient

10:

ztk\+1\(i\)←ReverseStep\(ztk\(i\),x^0\(i\),g\(i\);σ\(tk\),σ\(tk\+1\),C\)z^\{\(i\)\}\_\{t\_\{k\+1\}\}\\leftarrow\\textsc\{ReverseStep\}\\bigl\(z^\{\(i\)\}\_\{t\_\{k\}\},\\hat\{x\}^\{\(i\)\}\_\{0\},g^\{\(i\)\};\\sigma\(t\_\{k\}\),\\sigma\(t\_\{k\+1\}\),C\\bigr\)
11:endfor

12:if

tk\+1∈𝒯t\_\{k\+1\}\\in\\mathcal\{T\}\(drift event\)then

13:// Score, drift mobile sensors, observe, prune

14:

J←\{1,…,N\}∖rows\(S1:m0\)J\\leftarrow\\\{1,\\ldots,N\\\}\\setminus\\mathrm\{rows\}\(S\_\{1:m\_\{0\}\}\)⊳\\trianglerightdrift sensor candidate indices

15:for

j∈Jj\\in Jdo

16:

K^\(j\)←maxl≠l′∈A⁡\(x^0,j\(l\)−x^0,j\(l′\)\)2‖x^0\(l\)−x^0\(l′\)‖2\\widehat\{K\}\(j\)\\leftarrow\\displaystyle\\max\_\{l\\neq l^\{\\prime\}\\in A\}\\;\\frac\{\(\\hat\{x\}^\{\(l\)\}\_\{0,j\}\-\\hat\{x\}^\{\(l^\{\\prime\}\)\}\_\{0,j\}\)^\{2\}\}\{\{\\\|\\hat\{x\}^\{\(l\)\}\_\{0\}\-\\hat\{x\}^\{\(l^\{\\prime\}\)\}\_\{0\}\\\|\}^\{2\}\}
17:endfor

18:for

l=1,…,m1l=1,\\ldots,m\_\{1\}do

19:Draw

jl∝K^j\_\{l\}\\propto\\widehat\{K\}with

‖ξl−ξjl‖≤rdrift\\\|\\xi\_\{l\}\-\\xi\_\{j\_\{l\}\}\\\|\\leq r\_\{\\rm drift\}⊳\\trianglerightsample drift location within radius

20:endfor

21:Replace the last

m1m\_\{1\}rows of

SSwith rows

j1,…,jm1j\_\{1\},\\ldots,j\_\{m\_\{1\}\}of

INI\_\{N\}
22:

yS←Sx∗y\_\{S\}\\leftarrow Sx^\{\*\}⊳\\trianglerightread newly observed entries

23:

ℓi←−12ση2‖Sx^0\(i\)−yS‖2\\ell\_\{i\}\\leftarrow\-\\tfrac\{1\}\{2\\sigma\_\{\\eta\}^\{2\}\}\{\\\|S\\hat\{x\}^\{\(i\)\}\_\{0\}\-y\_\{S\}\\\|\}^\{2\}for

i∈Ai\\in A⊳\\trianglerightmeasurement log\-likelihood

24:

ℓ⋆←maxi∈A⁡ℓi\\ell^\{\\star\}\\leftarrow\\max\_\{i\\in A\}\\ell\_\{i\}
25:

A←\{i∈A:ℓ⋆−ℓi≤Δℓ\|S\|\}A\\leftarrow\\\{i\\in A:\\ell^\{\\star\}\-\\ell\_\{i\}\\leq\\Delta\\ell\|S\|\\\}⊳\\trianglerightprune chains

26:if

\|A\|<Nmin\|A\|<N\_\{\\min\}then

27:retain the

NminN\_\{\\min\}chains with largest

ℓi\\ell\_\{i\}⊳\\trianglerightfloor on alive ensemble size

28:endif

29:endif

30:endfor

31:// Collapse the ensemble

32:

i⋆←arg⁡maxi∈A⁡ℓii^\{\\star\}\\leftarrow\\arg\\max\_\{i\\in A\}\\ell\_\{i\}
33:return

x^←x^0\(i⋆\)\\hat\{x\}\\leftarrow\\hat\{x\}^\{\(i^\{\\star\}\)\}\_\{0\}or

x^←1\|A\|∑i∈Ax^0\(i\)\\hat\{x\}\\leftarrow\\tfrac\{1\}\{\|A\|\}\\sum\_\{i\\in A\}\\hat\{x\}^\{\(i\)\}\_\{0\}

Hyperparameters used in the experiments\.Ne=20N\_\{e\}=20chains and a logarithmic drift schedule withD=20D=20events on DiffusionPDE benchmarks \(Algorithm[2](https://arxiv.org/html/2605.06861#alg2)withΔℓ=∞\\Delta\\ell=\\infty, i\.e\. no pruning\) andD=10D=10events on the Pinball benchmark withΔℓ=1\\Delta\\ell=1,Nmin=1N\_\{\\min\}=1, andk0=3k\_\{0\}=3anchor sensors taken from the offline empirical\-Christoffel ordering\. Allocation\(n1,…,nD\)\(n\_\{1\},\\ldots,n\_\{D\}\)is uniform:nd=\(k−k0\)/Dn\_\{d\}=\(k\-k\_\{0\}\)/D\. Measurement noiseση=0\.1\\sigma\_\{\\eta\}=0\.1; DPS guidance weightαk\\alpha\_\{k\}insideReverseStepfollowsChunget al\.\[[2023](https://arxiv.org/html/2605.06861#bib.bib70)\]with linear schedule and unit weight atσ\(t\)=ση\\sigma\(t\)=\\sigma\_\{\\eta\}\.

Limits\.SettingD=0D=0\(no drift events\) reduces Algorithm[2](https://arxiv.org/html/2605.06861#alg2)to a fixed\-sensor DPS sampler with the offline empirical\-Christoffel selection of Algorithm[1](https://arxiv.org/html/2605.06861#alg1)fed in throughSS\. Replacing the secant numerator at line 14 by the per\-pixel ensemble standard deviationVarm\(x^0,j\(m\)\)1/2\\mathrm\{Var\}\_\{m\}\(\\hat\{x\}^\{\(m\)\}\_\{0,j\}\)^\{1/2\}recoversChakrabortyet al\.\[[2026](https://arxiv.org/html/2605.06861#bib.bib8)\]a Gaussian, offline analogue with the sup over secants replaced by a second\-moment summary\.

Complexity\.Each reverse\-diffusion step costsO\(Ne⋅cost\(Dθ\)\)O\(N\_\{e\}\\cdot\\mathrm\{cost\}\(D\_\{\\theta\}\)\)\. Each drift event costsO\(Ne2\|J\|\)O\(N\_\{e\}^\{2\}\|J\|\)for the pairwise score plusO\(Ne2k\)O\(N\_\{e\}^\{2\}k\)for residual evaluation, with\|J\|≤N\|J\|\\leq Nthe unobserved budget\. Score evaluation is typically dominated by the denoiser cost\.
Christoffel-DPS: Optimal sensor placement in diffusion posterior sampling for arbitrary distributions

Similar Articles

Conditional Diffusion Under Linear Constraints: Langevin Mixing and Information-Theoretic Guarantees

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

Backbone-Equated Diffusion OOD via Sparse Internal Snapshots

Elucidating the SNR-t Bias of Diffusion Probabilistic Models

Active Learning for Conditional Generative Compressed Sensing

Submit Feedback

Similar Articles

Conditional Diffusion Under Linear Constraints: Langevin Mixing and Information-Theoretic Guarantees
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
Backbone-Equated Diffusion OOD via Sparse Internal Snapshots
Elucidating the SNR-t Bias of Diffusion Probabilistic Models
Active Learning for Conditional Generative Compressed Sensing