Neural Fields for NV-Center Inverse Sensing

arXiv cs.LG Papers

Summary

This paper proposes NeTMY, an amortization-free coordinate neural field for inverse problems in NV-center quantum sensing, using a corrected forward model and sparse reconstruction losses to overcome center-collapse pathologies.

arXiv:2605.13988v1 Announce Type: new Abstract: Inverse problems in scientific sensing are often solved with either hand-designed regularizers or supervised networks trained on simulated labels, yet both can fail when the forward model is nonlinear, spectrally coupled, and physically delicate. We study this issue for noise sensing based on nitrogen-vacancy (NV) centers in diamond, where a quantum sensor measures magnetic-noise spectra generated by sparse spin sources. We show that replacing a common scalar/coherent forward approximation with a tensor power-summed dipolar operator changes the inverse landscape and exposes a center-collapse failure mode in free-density optimization. We propose NeTMY, an amortization-free coordinate neural field coupled to the differentiable NV forward model, with annealed positional encoding, multiscale optimization, sparsity/gating, and spectrum-fidelity losses. Across sparse synthetic reconstructions generated by the corrected operator, NeTMY achieves the best localization and distributional metrics in the tested benchmark. Mechanism experiments show that NeTMY does not directly execute the raw density-space gradient; its parameterization smooths and redistributes updates, mitigating the center-collapse pathology. These results position NV quantum sensing as a useful testbed for physics-faithful neural inverse problems.
Original Article
View Cached Full Text

Cached at: 05/15/26, 06:26 AM

# Neural Fields for NV-Center Inverse Sensing
Source: [https://arxiv.org/html/2605.13988](https://arxiv.org/html/2605.13988)
Zhixuan Zhao∗1,2Tao Zhong1Yixun Hu1 Nathalie P\. de Leon1Christine Allen\-Blanchette1 1Princeton University,2Tsinghua University \{tzhong, ca15\}@princeton\.edu

###### Abstract

Inverse problems in scientific sensing are often solved with either hand\-designed regularizers or supervised networks trained on simulated labels, yet both can fail when the forward model is nonlinear, spectrally coupled, and physically delicate\. We study this issue for noise sensing based on nitrogen\-vacancy \(NV\) centers in diamond, where a quantum sensor measures magnetic\-noise spectra generated by sparse spin sources\. We show that replacing a common scalar/coherent forward approximation with a tensor power\-summed dipolar operator changes the inverse landscape and exposes a center\-collapse failure mode in free\-density optimization\. We propose NeTMY, an amortization\-free coordinate neural field coupled to the differentiable NV forward model, with annealed positional encoding, multiscale optimization, sparsity/gating, and spectrum\-fidelity losses\. Across sparse synthetic reconstructions generated by the corrected operator, NeTMY achieves the best localization and distributional metrics in the tested benchmark\. Mechanism experiments show that NeTMY does not directly execute the raw density\-space gradient; its parameterization smooths and redistributes updates, mitigating the center\-collapse pathology\. These results position NV quantum sensing as a useful testbed for physics\-faithful neural inverse problems\.

![Refer to caption](https://arxiv.org/html/2605.13988v1/figures/teaser.png)Figure 1:NV relaxometry maps sparse fluctuating spins to a noisy frequency\-domain measurement\. The inverse problem aims to recover the spin density and local Larmor response from the spectrum\.## 1Introduction

Scientific machine learning is increasingly being used in settings where the quantity of interest is never observed directly\. Instead, one observes the output of an instrument: a low\-dimensional, noisy, spatially blurred, and often spectrally transformed measurement generated by a known but imperfect physical process\[[5](https://arxiv.org/html/2605.13988#bib.bib1),[37](https://arxiv.org/html/2605.13988#bib.bib2)\]\. This setting is fundamentally different from standard supervised prediction\. Paired labels are expensive or impossible to obtain, synthetic data may not match experimental conditions, and seemingly minor approximations in the forward model can change not only the measurement distribution, but also the geometry of the inverse objective\. As a result, a method that performs well under a convenient simulator may fail when the physical operator is made more faithful\[[36](https://arxiv.org/html/2605.13988#bib.bib3)\]\. This interaction between forward\-model fidelity, representation, and optimization is a central challenge for scientific ML\.

We study this issue in nitrogen\-vacancy \(NV\) center noise sensing\[[18](https://arxiv.org/html/2605.13988#bib.bib4),[16](https://arxiv.org/html/2605.13988#bib.bib5)\]\. This problem is closely related to the better\-established task of physical reconstruction from static magnetic\-field images, where NV magnetic maps have been inverted to recover underlying current or magnetization distributions\[[51](https://arxiv.org/html/2605.13988#bib.bib23)\]\. Here, however, the target is not a static field map but a sparse distribution of fluctuating spin sources and their local Larmor response inferred from frequency\-dependent magnetic\-noise spectra\. As illustrated in Figure[1](https://arxiv.org/html/2605.13988#S0.F1), in this toy model, NV centers measure spectra induced by nearby fluctuating spins, while the scientific target is the spatial and spectral structure of the spin sources\[[11](https://arxiv.org/html/2605.13988#bib.bib6),[55](https://arxiv.org/html/2605.13988#bib.bib7),[73](https://arxiv.org/html/2605.13988#bib.bib8)\]\. Recent non\-ML NV noise/correlation measurements further motivate this reconstruction setting by showing that multiplexed NV platforms can access noise spectra and magnetic\-field correlations at multiple spatial locations\[[13](https://arxiv.org/html/2605.13988#bib.bib85)\]\. The resulting inverse problem is highly ill\-conditioned\[[20](https://arxiv.org/html/2605.13988#bib.bib9)\]: the dipolar response is low\-pass and rapidly decaying\[[55](https://arxiv.org/html/2605.13988#bib.bib7)\], nearby sources can merge below the effective point\-spread width\[[23](https://arxiv.org/html/2605.13988#bib.bib10)\], finite\-window effects create spatial bias, and spectrum normalization introduces scale ambiguity and nonlocal gradients\[[83](https://arxiv.org/html/2605.13988#bib.bib11)\]\. Thus, even with a differentiable forward model, direct reconstruction from magnetic noise spectra can have misleading descent directions and physically implausible local optima\.

Existing approaches leave an important gap\. Classical sparse inverse methods such as Tikhonov regularization\[[32](https://arxiv.org/html/2605.13988#bib.bib12),[31](https://arxiv.org/html/2605.13988#bib.bib13)\]or ADMM\[[56](https://arxiv.org/html/2605.13988#bib.bib14)\]optimize the density field directly, but in our setting, this free\-density parameterization can follow pathological gradients and collapse to central artifacts under the physical objective\. Supervised image\-to\-image predictors, such as U\-Nets\[[65](https://arxiv.org/html/2605.13988#bib.bib15)\]or GANs\[[22](https://arxiv.org/html/2605.13988#bib.bib16),[59](https://arxiv.org/html/2605.13988#bib.bib17)\], can be strong when paired training data match the test distribution, but paired spin\-density labels are scarce, and synthetic dense training distributions may not generalize to sparse scenes\. Generic untrained neural priors\[[28](https://arxiv.org/html/2605.13988#bib.bib18),[75](https://arxiv.org/html/2605.13988#bib.bib19)\]avoid supervision, but do not by themselves address the tensor relaxometry operator, Larmor\-spectrum coupling, or scale ambiguity of this sensing problem\.

To this end, we propose NeTMY, an amortization\-free neural\-field solver for NV relaxometry inversion\. NeTMY represents the unknown density and spectral fields with a coordinate MLP and optimizes its parameters separately for each measurement through a differentiable tensor power\-summed forward operator\. The method uses annealed positional encoding\[[72](https://arxiv.org/html/2605.13988#bib.bib20)\]and multiscale optimization to recover sparse high\-frequency structure while maintaining stable descent, together with spectrum consistency, sparsity, TV regularization\[[67](https://arxiv.org/html/2605.13988#bib.bib21)\], density gating, and measurement\-derived scale correction\. Crucially, the forward operator sums tensor\-channel magnetic\-noise powers rather than squaring a coherently summed scalar field, avoiding nonphysical cross terms that appear in the simplified solver\[[24](https://arxiv.org/html/2605.13988#bib.bib22),[51](https://arxiv.org/html/2605.13988#bib.bib23)\]\. This produces a more faithful inverse objective and exposes failure modes that are hidden or distorted under the scalar approximation\.

Our experiments suggest that NeTMY’s advantage comes from optimization geometry as much as expressivity\. Direct density optimization follows the raw density\-space gradient, whereas a neural\-field update induces an image\-space step filtered by the representation Jacobian\. Our contributions are summarized as follows:

- •We formulate NV noise sensing inversion as a physics\-faithful differentiable inverse problem and introduce a tensor power\-summed forward operator that avoids nonphysical cross terms in simplified scalar solvers\.
- •We propose NeTMY, an amortization\-free coordinate neural\-field solver that reconstructs sparse density and spectral fields from individual measurements without paired density labels\.
- •We show empirically and mechanistically that forward\-model fidelity changes inverse\-problem landscapes, and that NeTMY’s parameterization mitigates center\-collapse while isolating the contributing components and the remaining failure modes\.

## 2Related Work

NV centershave become a standard tool for nanoscale magnetic sensing, mapping local magnetic\-noise into optical /T1T\_\{1\}andT2T\_\{2\}measurement with single\-defect resolution\[[11](https://arxiv.org/html/2605.13988#bib.bib6),[16](https://arxiv.org/html/2605.13988#bib.bib5),[18](https://arxiv.org/html/2605.13988#bib.bib4),[73](https://arxiv.org/html/2605.13988#bib.bib8),[76](https://arxiv.org/html/2605.13988#bib.bib24)\]\. Recent applications include spin\-bath sensing\[[55](https://arxiv.org/html/2605.13988#bib.bib7),[1](https://arxiv.org/html/2605.13988#bib.bib25),[33](https://arxiv.org/html/2605.13988#bib.bib26)\], nanoscale sensing for condensed matter physics and materials science\[[66](https://arxiv.org/html/2605.13988#bib.bib86)\], and high\-resolution nanoscale measurements\[[46](https://arxiv.org/html/2605.13988#bib.bib28),[47](https://arxiv.org/html/2605.13988#bib.bib29),[6](https://arxiv.org/html/2605.13988#bib.bib30),[71](https://arxiv.org/html/2605.13988#bib.bib31),[74](https://arxiv.org/html/2605.13988#bib.bib32),[51](https://arxiv.org/html/2605.13988#bib.bib23)\]\. These works establish the sensing physics and forward models but do not study learning\-centric inverse algorithms or operator\-fidelity\-induced optimization\-geometry changes, while we treat NV noise sensing as a differentiable inverse\-problem benchmark\.

Sparse optimization algorithms in classical inverse problems, such as Tikhonov and total\-variation regularization\[[67](https://arxiv.org/html/2605.13988#bib.bib21),[9](https://arxiv.org/html/2605.13988#bib.bib33)\], ADMM\[[56](https://arxiv.org/html/2605.13988#bib.bib14)\], and proximal methods\[[12](https://arxiv.org/html/2605.13988#bib.bib34),[14](https://arxiv.org/html/2605.13988#bib.bib35)\]are standard for ill\-posed reconstruction\[[5](https://arxiv.org/html/2605.13988#bib.bib1),[36](https://arxiv.org/html/2605.13988#bib.bib3),[78](https://arxiv.org/html/2605.13988#bib.bib36)\], with a long line of compressed\-sensing\[[10](https://arxiv.org/html/2605.13988#bib.bib39),[19](https://arxiv.org/html/2605.13988#bib.bib40)\]and sparse\-recovery results\[[2](https://arxiv.org/html/2605.13988#bib.bib37),[35](https://arxiv.org/html/2605.13988#bib.bib41),[8](https://arxiv.org/html/2605.13988#bib.bib38),[44](https://arxiv.org/html/2605.13988#bib.bib43),[42](https://arxiv.org/html/2605.13988#bib.bib42)\]\. These tools have strong guarantees in convex / well\-conditioned linear inverse settings, but the NV inverse objective is nonlinear, max\-normalized, finite\-windowed, tensorial, and spectrally coupled\. Standard convexity\-based stability analysis does not transfer, and free\-density local methods can be attracted to center\-biased gradients\.

Data\-driven inverse solvers, such as supervised image\-to\-image networks\[[65](https://arxiv.org/html/2605.13988#bib.bib15),[50](https://arxiv.org/html/2605.13988#bib.bib48)\], learned unrolling and primal\-dual networks\[[54](https://arxiv.org/html/2605.13988#bib.bib49),[27](https://arxiv.org/html/2605.13988#bib.bib46),[81](https://arxiv.org/html/2605.13988#bib.bib53),[25](https://arxiv.org/html/2605.13988#bib.bib45)\], plug\-and\-play priors\[[77](https://arxiv.org/html/2605.13988#bib.bib52),[64](https://arxiv.org/html/2605.13988#bib.bib51),[49](https://arxiv.org/html/2605.13988#bib.bib47),[82](https://arxiv.org/html/2605.13988#bib.bib54)\], and GAN\-based reconstruction\[[57](https://arxiv.org/html/2605.13988#bib.bib50),[21](https://arxiv.org/html/2605.13988#bib.bib44)\]provide strong baselines when paired training data matches test data\. The field has separately documented reconstruction instabilities under perturbation and model shift\[[4](https://arxiv.org/html/2605.13988#bib.bib55)\], robustness gaps\[[30](https://arxiv.org/html/2605.13988#bib.bib57),[62](https://arxiv.org/html/2605.13988#bib.bib60)\], and benchmarking pitfalls\[[43](https://arxiv.org/html/2605.13988#bib.bib58),[61](https://arxiv.org/html/2605.13988#bib.bib59),[17](https://arxiv.org/html/2605.13988#bib.bib56)\]\. In NV relaxometry, paired spin\-density labels are scarce\[[55](https://arxiv.org/html/2605.13988#bib.bib7)\]\. We document train\-on\-dense / test\-on\-sparse failure modes and use per\-instance physics\-driven optimization to side\-step them\.

Untrained neural priors, neural fields, and physics\-informed differentiable scientific ML\.On the parameterization\-as\-prior side, Deep Image Prior\[[75](https://arxiv.org/html/2605.13988#bib.bib19)\], Deep Decoder\[[28](https://arxiv.org/html/2605.13988#bib.bib18),[29](https://arxiv.org/html/2605.13988#bib.bib63)\], Fourier features / SIREN\[[72](https://arxiv.org/html/2605.13988#bib.bib20),[70](https://arxiv.org/html/2605.13988#bib.bib72)\], NeRF\[[52](https://arxiv.org/html/2605.13988#bib.bib66)\], INR\-based reconstruction\[[58](https://arxiv.org/html/2605.13988#bib.bib68),[48](https://arxiv.org/html/2605.13988#bib.bib65),[53](https://arxiv.org/html/2605.13988#bib.bib67),[69](https://arxiv.org/html/2605.13988#bib.bib71),[63](https://arxiv.org/html/2605.13988#bib.bib69),[68](https://arxiv.org/html/2605.13988#bib.bib70),[3](https://arxiv.org/html/2605.13988#bib.bib61),[26](https://arxiv.org/html/2605.13988#bib.bib62),[84](https://arxiv.org/html/2605.13988#bib.bib73)\], and 3D Gaussian Splatting\[[38](https://arxiv.org/html/2605.13988#bib.bib64)\]demonstrate that the structure of a coordinate or generator network itself biases reconstruction\. On the physics\-in\-loop side, physics\-informed neural networks\[[60](https://arxiv.org/html/2605.13988#bib.bib76),[37](https://arxiv.org/html/2605.13988#bib.bib2),[15](https://arxiv.org/html/2605.13988#bib.bib77),[34](https://arxiv.org/html/2605.13988#bib.bib78)\]and differentiable simulators\[[86](https://arxiv.org/html/2605.13988#bib.bib75),[85](https://arxiv.org/html/2605.13988#bib.bib74)\]incorporate known physical laws as losses or as differentiable forward models\. Generic neural priors do not encode NV tensor physics, Larmor\-spectrum coupling, or scale ambiguity, and the physics\-informed thread has rarely studied how forward\-operator fidelity reshapes inverse\-problem optimization geometry\. We connect the two threads on a concrete differentiable scientific inverse problem and ground the parameterization\-geometry effect in measurable optimization signatures\.

## 3Preliminaries and Problem Formulation

We fix notation, the forward measurement model, the canonical inverse objective, and the structural sources of ill\-posedness that the Method \(Section[4](https://arxiv.org/html/2605.13988#S4)\) is designed to address\. Detailed derivations and diagnostics are deferred to Appendices[A](https://arxiv.org/html/2605.13988#A1),[B](https://arxiv.org/html/2605.13988#A2), and[C](https://arxiv.org/html/2605.13988#A3)\.

### 3\.1Measurement Model

LetΩ⊂ℝ2\\Omega\\subset\\mathbb\{R\}^\{2\}be a finite spatial grid and letρ:Ω→ℝ≥0\\rho:\\Omega\\to\\mathbb\{R\}\_\{\\geq 0\}denote the unknown spin\-source density\. A widefield NV array sits at standoff heightz0z\_\{0\}aboveΩ\\Omegaand reads out a frequency\-dependent noise spectrumSobs​\(ω,𝐫\)=ℱ​\(ρ,ωL\)​\(ω,𝐫\)\+εS\_\{\\mathrm\{obs\}\}\(\\omega,\\mathbf\{r\}\)=\\mathcal\{F\}\(\\rho,\\omega\_\{L\}\)\(\\omega,\\mathbf\{r\}\)\+\\varepsilonon a discrete gridW⊂ℝ\+W\\subset\\mathbb\{R\}\_\{\+\}, whereωL:Ω→ℝ\+\\omega\_\{L\}:\\Omega\\to\\mathbb\{R\}\_\{\+\}is a per\-pixel Larmor field encoding local detuning,ε∼𝒩​\(0,σε2\)\\varepsilon\\sim\\mathcal\{N\}\(0,\\sigma\_\{\\varepsilon\}^\{2\}\)is sensor noise with standard deviationσε\\sigma\_\{\\varepsilon\}, andℱ\\mathcal\{F\}is a differentiable forward operator\. The dipolar Green tensor coupling sample spins to the NVzz\-axis is

Gi​a​\(𝐑\)=μ04​π​3​Ri​Ra−‖𝐑‖2​δi​a‖𝐑‖5,i,a∈\{x,y,z\},G\_\{ia\}\(\\mathbf\{R\}\)\\;=\\;\\frac\{\\mu\_\{0\}\}\{4\\pi\}\\,\\frac\{3R\_\{i\}R\_\{a\}\-\\left\\lVert\\mathbf\{R\}\\right\\rVert^\{2\}\\delta\_\{ia\}\}\{\\left\\lVert\\mathbf\{R\}\\right\\rVert^\{5\}\},\\qquad i,a\\in\\\{x,y,z\\\},\(1\)where𝐑​\(𝐫,𝐫src\)=\(𝐫−𝐫src,z0\)\\mathbf\{R\}\(\\mathbf\{r\},\\mathbf\{r\}\_\{\\mathrm\{src\}\}\)=\(\\mathbf\{r\}\-\\mathbf\{r\}\_\{\\mathrm\{src\}\},z\_\{0\}\),μ0\\mu\_\{0\}is the vacuum permeability andδi​a\\delta\_\{ia\}the Kronecker delta\. The spectral response of a dipole resonant atωL\\omega\_\{L\}with linewidthγ\\gammais the LorentzianL​\(ω;ωL\)=γ2/\(\(ω−ωL\)2\+γ2\)L\(\\omega;\\omega\_\{L\}\)=\\gamma^\{2\}/\(\(\\omega\-\\omega\_\{L\}\)^\{2\}\+\\gamma^\{2\}\)\[[73](https://arxiv.org/html/2605.13988#bib.bib8),[18](https://arxiv.org/html/2605.13988#bib.bib4),[11](https://arxiv.org/html/2605.13988#bib.bib6),[55](https://arxiv.org/html/2605.13988#bib.bib7)\]\.

We compare two FFT\-based forward operators that differ in how transverse\-field noise is aggregated across the channels ofGGand across distinct sources:

ℱ1​\(ρ,ωL\)​\(ω,𝐫\)\\displaystyle\\mathcal\{F\}\_\{1\}\(\\rho,\\omega\_\{L\}\)\(\\omega,\\mathbf\{r\}\)=\|\(Gnv∗ρ\)​\(𝐫\)\|2​L​\(ω;ωL​\(𝐫\)\),\\displaystyle\\;=\\;\\bigl\|\(G\_\{\\mathrm\{nv\}\}\\\!\\ast\\rho\)\(\\mathbf\{r\}\)\\bigr\|^\{2\}\\,L\\bigl\(\\omega;\\omega\_\{L\}\(\\mathbf\{r\}\)\\bigr\),\(2\)ℱ2​\(ρ,ωL\)​\(ω,𝐫\)\\displaystyle\\mathcal\{F\}\_\{2\}\(\\rho,\\omega\_\{L\}\)\(\\omega,\\mathbf\{r\}\)=\[∑a∈\{x,y,z\}\(\|Ga​z\|2∗ρ\)​\(𝐫\)\]​L​\(ω;ωL​\(𝐫\)\)\.\\displaystyle\\;=\\;\\biggl\[\\sum\_\{a\\in\\\{x,y,z\\\}\}\\\!\\bigl\(\\left\|G\_\{az\}\\right\|^\{2\}\\\!\\ast\\rho\\bigr\)\(\\mathbf\{r\}\)\\biggr\]\\,L\\bigl\(\\omega;\\omega\_\{L\}\(\\mathbf\{r\}\)\\bigr\)\.where∗\\astdenotes 2D spatial convolution\. The scalar/coherent operatorℱ1\\mathcal\{F\}\_\{1\}squares a single NV\-axis projectionGnv=∑ana​Ga​zG\_\{\\mathrm\{nv\}\}=\\sum\_\{a\}n\_\{a\}G\_\{az\}\(𝐧=\(0,0,1\)\\mathbf\{n\}=\(0,0,1\)here\) after spatial superposition, while the tensor/incoherent operatorℱ2\\mathcal\{F\}\_\{2\}sums per\-channel power kernels and treats distinct sources as independent thermal fluctuators\. Their pointwise difference is a kernel\-product cross\-term coupling pairs of distinct sources, which is the algebraic origin of the inverse\-landscape difference exploited in Section[5](https://arxiv.org/html/2605.13988#S5)\. We treatℱ2\\mathcal\{F\}\_\{2\}as the physically appropriate inversion operator\. Becauseℱ2\\mathcal\{F\}\_\{2\}factorizes the Lorentzian response outside the kernel convolution, it is exact for locally constantωL\\omega\_\{L\}and remains controlled whenχ∼Δ​ωL\(kernel\)/γ≪1\\chi\\sim\\Delta\\omega\_\{L\}^\{\(\\mathrm\{kernel\}\)\}/\\gamma\\ll 1, which is suitable for the research scale\. The scope of application and limitations ofℱ2\\mathcal\{F\}\_\{2\}is in Appendix[A\.8](https://arxiv.org/html/2605.13988#A1.SS8); we additionally introduce a slower direct simulatorℱ3\\mathcal\{F\}\_\{3\}that evaluates the source\-side superposition without FFT factorization as the most physically accurate operator\(Appendix[A\.4](https://arxiv.org/html/2605.13988#A1.SS4)\)\. To avoid the inverse\-crime risk of reusing the inversion operator as the data\-generation operator\[[36](https://arxiv.org/html/2605.13988#bib.bib3)\], the main\-text benchmark generates spectra withℱ3\\mathcal\{F\}\_\{3\}and inverts underℱ1\\mathcal\{F\}\_\{1\}orℱ2\\mathcal\{F\}\_\{2\}; matched\-operator results that generate and invert with the sameℱ1\\mathcal\{F\}\_\{1\}orℱ2\\mathcal\{F\}\_\{2\}are reported in the appendix as a complementary view\. Full pointwise derivations ofℱ1\\mathcal\{F\}\_\{1\},ℱ2\\mathcal\{F\}\_\{2\},ℱ3\\mathcal\{F\}\_\{3\}, and the regime of physical applicability of each form are given in Appendix[A](https://arxiv.org/html/2605.13988#A1)\.

### 3\.2Inverse Problem

Following standard practice in NV sensing\[[41](https://arxiv.org/html/2605.13988#bib.bib27),[51](https://arxiv.org/html/2605.13988#bib.bib23),[11](https://arxiv.org/html/2605.13988#bib.bib6)\], the spectrum is summed acrossWWand max\-normalized before being compared on a logarithmic scale\. WritingN​\(𝐫;S\)=∑ω∈WS​\(ω,𝐫\)N\(\\mathbf\{r\};S\)=\\sum\_\{\\omega\\in W\}S\(\\omega,\\mathbf\{r\}\)andN^​\(𝐫;S\)=N​\(𝐫;S\)/max𝐫′⁡N​\(𝐫′;S\)\\widehat\{N\}\(\\mathbf\{r\};S\)=N\(\\mathbf\{r\};S\)/\\max\_\{\\mathbf\{r\}^\{\\prime\}\}N\(\\mathbf\{r\}^\{\\prime\};S\), the data\-fidelity term𝒟​\(ℱ​\(ρ,ωL\),Sobs\)\\mathcal\{D\}\(\\mathcal\{F\}\(\\rho,\\omega\_\{L\}\),S\_\{\\mathrm\{obs\}\}\)is the pixelwise log\-MSE betweenN^​\(⋅;ℱ​\(ρ,ωL\)\)\\widehat\{N\}\(\\,\\cdot\\,;\\mathcal\{F\}\(\\rho,\\omega\_\{L\}\)\)andN^​\(⋅;Sobs\)\\widehat\{N\}\(\\,\\cdot\\,;S\_\{\\mathrm\{obs\}\}\)\(full form in Appendix[B](https://arxiv.org/html/2605.13988#A2)\)\. The canonical regularized inverse problem is

minρ≥0,ωL⁡𝒟​\(ℱ​\(ρ,ωL\),Sobs\)\+λ1​‖ρ‖1\+λTV​TV​\(ρ\),\\min\_\{\\rho\\geq 0,\\;\\omega\_\{L\}\}\\;\\mathcal\{D\}\\bigl\(\\mathcal\{F\}\(\\rho,\\omega\_\{L\}\),S\_\{\\mathrm\{obs\}\}\\bigr\)\\;\+\\;\\lambda\_\{1\}\\left\\lVert\\rho\\right\\rVert\_\{1\}\\;\+\\;\\lambda\_\{\\mathrm\{TV\}\}\\,\\mathrm\{TV\}\(\\rho\),\(3\)where‖ρ‖1=∑𝐫∈Ω\|ρ​\(𝐫\)\|\\left\\lVert\\rho\\right\\rVert\_\{1\}=\\sum\_\{\\mathbf\{r\}\\in\\Omega\}\\left\|\\rho\(\\mathbf\{r\}\)\\right\|is theℓ1\\ell\_\{1\}norm of the discretized density,TV​\(ρ\)=∑𝐫∈Ω\(\|ρ​\(𝐫\+𝐞x\)−ρ​\(𝐫\)\|\+\|ρ​\(𝐫\+𝐞y\)−ρ​\(𝐫\)\|\)\\mathrm\{TV\}\(\\rho\)=\\sum\_\{\\mathbf\{r\}\\in\\Omega\}\\bigl\(\\left\|\\rho\(\\mathbf\{r\}\+\\mathbf\{e\}\_\{x\}\)\-\\rho\(\\mathbf\{r\}\)\\right\|\+\\left\|\\rho\(\\mathbf\{r\}\+\\mathbf\{e\}\_\{y\}\)\-\\rho\(\\mathbf\{r\}\)\\right\|\\bigr\)is the anisotropic total variation,𝐞x,𝐞y\\mathbf\{e\}\_\{x\},\\mathbf\{e\}\_\{y\}are unit grid steps, andλ1,λTV≥0\\lambda\_\{1\},\\lambda\_\{\\mathrm\{TV\}\}\\geq 0are scalar weights\. Eq\. \([3](https://arxiv.org/html/2605.13988#S3.E3)\) is the objective shared by every method in our benchmark; classical baselines \(Tikhonov, ADMM, GaussianSplat\) and the proposed neural field method differ in \(i\) howρ\\rhois parameterized and \(ii\) which method\-specific physics losses they add \(Section[4](https://arxiv.org/html/2605.13988#S4)\)\.

A direct consequence of the max\-normalization inN^\\widehat\{N\}is that the linearityℱ2​\(c​ρ,ωL\)=c​ℱ2​\(ρ,ωL\)\\mathcal\{F\}\_\{2\}\(c\\rho,\\omega\_\{L\}\)=c\\,\\mathcal\{F\}\_\{2\}\(\\rho,\\omega\_\{L\}\)leavesN^\\widehat\{N\}invariant for everyc\>0c\>0, so the pre\-normalization scale ofρ\\rhois unidentifiable fromN^\\widehat\{N\}alone\[[2](https://arxiv.org/html/2605.13988#bib.bib37),[44](https://arxiv.org/html/2605.13988#bib.bib43)\]\. This is a property of the problem; the resolution adopted by the methods compared here is described in Section[4](https://arxiv.org/html/2605.13988#S4)and proved in Appendix[B](https://arxiv.org/html/2605.13988#A2)\.

### 3\.3Ill\-posedness

Even withℱ2\\mathcal\{F\}\_\{2\}in place, four structural properties make Eq\. \([3](https://arxiv.org/html/2605.13988#S3.E3)\) severely ill\-conditioned\. We name and locate them here; the formal Lemma[1](https://arxiv.org/html/2605.13988#Thmlemma1)\(frequency decay\), the per\-pixel Jacobian\-column squared norm, the max\-normalization gradient, and a heuristic estimate of the effective point\-spread width are in Appendix[C](https://arxiv.org/html/2605.13988#A3)\.

\(P1\) Exponential frequency suppression\.Each tensor channel\|Ga​z\|2\|G\_\{az\}\|^\{2\}inℱ2\\mathcal\{F\}\_\{2\}has 2D Fourier transform with exponential envelopee−k​z0e^\{\-kz\_\{0\}\}up to algebraic factors inkk, wherek=‖𝐤‖k=\\left\\lVert\\mathbf\{k\}\\right\\rVertis the magnitude of the spatial\-frequency vector𝐤\\mathbf\{k\}\[[76](https://arxiv.org/html/2605.13988#bib.bib24)\], so high\-spatial\-frequency density features carry vanishingly little forward signal \(Lemma[1](https://arxiv.org/html/2605.13988#Thmlemma1)\)\.

\(P2\) Finite\-window center bias\.The convolutional footprint of a source pixel is more fully observable at the window center than at the boundary, so the per\-pixel forward sensitivity‖∂ℱ/∂ρ​\(𝐫0\)‖\\left\\lVert\\partial\\mathcal\{F\}/\\partial\\rho\(\\mathbf\{r\}\_\{0\}\)\\right\\rVertis larger near the center, producing a centered descent direction even from a uniform initialization\.

\(P3\) Max\-normalization peak coupling\.The derivative of the max\-normalized noise map contains a nonlocal term concentrated at the current peak pixel, sharpening any incipient peak and amplifying the \(P2\) center bias once a peak appears\.

\(P4\) Resolution\-limited merging and joint\(ρ,ωL\)\(\\rho,\\omega\_\{L\}\)ambiguity\.Source pairs separated by less than𝒪​\(z0\)\\mathcal\{O\}\(z\_\{0\}\)cannot be disambiguated, andωL\\omega\_\{L\}is identifiable only on the support ofρ\\rho\.

These pathologies, together withℱ2\\mathcal\{F\}\_\{2\}’s spectrally coupled, finite\-window structure, are why the parameterization ofρ\\rhomatters more than ordinary regularizer tuning in Eq\. \([3](https://arxiv.org/html/2605.13988#S3.E3)\)\. Section[4](https://arxiv.org/html/2605.13988#S4)addresses \(P1\)–\(P3\) through representation geometry and the curriculum of the neural\-field optimizer, and \(P4\) through density\-masked Larmor handling\.

## 4Method

NeTMY solves the inverse problem of Eq\. \([3](https://arxiv.org/html/2605.13988#S3.E3)\)*per measurement*: rather than learning an amortized inverse map from a paired dataset, it represents the unknown densityρ\\rhoand Larmor fieldωL\\omega\_\{L\}as the output of a coordinate MLP and optimizes the MLP parameters against a single observed spectrum through a differentiable forward operatorℱ∈\{ℱ1,ℱ2\}\\mathcal\{F\}\\in\\\{\\mathcal\{F\}\_\{1\},\\mathcal\{F\}\_\{2\}\\\}\. We then introduce design choices that, in combination, address the structural pathologies \(P1\)–\(P4\) of §[3\.3](https://arxiv.org/html/2605.13988#S3.SS3)through the optimization geometry of the parameterization rather than through additional regularizer tuning\. A method overview is given in Figure[2](https://arxiv.org/html/2605.13988#S4.F2); full architecture, loss, schedule, and Jacobian\-filtering derivations are in Appendix[D](https://arxiv.org/html/2605.13988#A4)\.

![Refer to caption](https://arxiv.org/html/2605.13988v1/figures/method_overview_alt.png)Figure 2:NeTMY pipeline\. Coordinates pass through annealed Fourier features and a coordinate MLPfθf\_\{\\theta\}to a densityρ\\rhoand Larmor fieldωL\\omega\_\{L\}; a differentiable forward operatorℱ\\mathcal\{F\}predicts a spectrum compared with the observation under lossℒ\\mathcal\{L\}\. The same MLP is trained per instance across two resolutions in sequence; gradients flow back through the operator, with no paired training data\.### 4\.1Coordinate Neural Field

We parameterize the unknowns as an MLP of grid coordinates,

fθ:\(x,y\)↦\(ρθ​\(x,y\),ωL,θ​\(x,y\)\),\(x,y\)∈Ω,f\_\{\\theta\}:\(x,y\)\\;\\mapsto\\;\\bigl\(\\rho\_\{\\theta\}\(x,y\),\\;\\omega\_\{L,\\theta\}\(x,y\)\\bigr\),\\qquad\(x,y\)\\in\\Omega,\(4\)with the network input augmented by annealed Fourier positional featuresγβ​\(x,y\)\\gamma\_\{\\beta\}\(x,y\)whose high\-frequency bands are gated on smoothly through a training\-progress scalarβ\\beta\[[72](https://arxiv.org/html/2605.13988#bib.bib20),[58](https://arxiv.org/html/2605.13988#bib.bib68),[70](https://arxiv.org/html/2605.13988#bib.bib72)\]\. Annealing addresses \(P1\) by fitting low\-‖𝐤‖\\left\\lVert\\mathbf\{k\}\\right\\rVertstructure ofρ\\rhobefore high\-‖𝐤‖\\left\\lVert\\mathbf\{k\}\\right\\rVertdetail\. The MLP is a fully connected backbone with a skip connection re\-injecting the encoded input partway through\.

The two output heads use bounded transforms tailored to the physical priors\. The density head is a gated softplus,

ρθ​\(x,y\)=softplus​\(hρ​\(x,y\)\)​σ​\(g​\(x,y\)\),\\rho\_\{\\theta\}\(x,y\)\\;=\\;\\mathrm\{softplus\}\\bigl\(h\_\{\\rho\}\(x,y\)\\bigr\)\\,\\sigma\\bigl\(g\(x,y\)\\bigr\),\(5\)wherehρh\_\{\\rho\}is the raw density logit,ggis a separate gate logit, andσ\\sigmais the elementwise sigmoid\. The softplus enforcesρθ≥0\\rho\_\{\\theta\}\\geq 0, while the gate factorσ​\(g\)∈\(0,1\)\\sigma\(g\)\\in\(0,1\)lets the network drive pixel regions to near zero without saturating, which is the operational counterpart of the \(P3\) self\-reinforcing peak observation\. The Larmor head sigmoid\-maps a logit into the device\-determined band\[ωL,min,ωL,max\]\[\\omega\_\{L,\\min\},\\omega\_\{L,\\max\}\]and is masked to the support ofρ\\rhoso that gradient flows intoωL\\omega\_\{L\}only where the data constrains it \(P4\)\.

### 4\.2Multiscale Curriculum

Optimization proceeds in two stages on the same MLP parameters, with the coordinate grid resolution increasing between stages\. Stage 1 fits coarse spatial structure under the coarsened forward operator at the base learning rate; Stage 2 refines high\-frequency detail at the higher resolution at a reduced learning rate\. By the bandlimit argument of Lemma[1](https://arxiv.org/html/2605.13988#Thmlemma1), low\-‖𝐤‖\\left\\lVert\\mathbf\{k\}\\right\\rVertmodes carry the bulk of the forward signal and are well\-conditioned, so fitting them first delivers a coarse minimum closer to the true support before the high\-frequency curriculum activates\. Per\-stage resolutions, epoch counts, and learning rates are reported in Appendix[D\.3](https://arxiv.org/html/2605.13988#A4.SS3)\(Table[6](https://arxiv.org/html/2605.13988#A4.T6)\)\.

### 4\.3Per\-Measurement Objective

NeTMY minimizes a NeTMY\-specific extension of Eq\. \([3](https://arxiv.org/html/2605.13988#S3.E3)\) by gradient descent onθ\\thetaalone, with no oracle access toρ⋆\\rho\_\{\\star\}orωL,⋆\\omega\_\{L,\\star\}\. The objective combines the canonical fidelity𝒟\\mathcal\{D\}with two method\-specific physics losses motivated by the ill\-posedness analysis,

ℒ​\(θ\)=𝒟​\(ℱ​\(ρθ,ωL,θ\),Sobs\)\+λ1​‖ρθ‖1\+λTV​TV​\(ρθ\)\+λN​ℛnm\+λρ​ℛds,\\mathcal\{L\}\(\\theta\)\\;=\\;\\mathcal\{D\}\\bigl\(\\mathcal\{F\}\(\\rho\_\{\\theta\},\\omega\_\{L,\\theta\}\),S\_\{\\mathrm\{obs\}\}\\bigr\)\\;\+\\;\\lambda\_\{1\}\\left\\lVert\\rho\_\{\\theta\}\\right\\rVert\_\{1\}\\;\+\\;\\lambda\_\{\\mathrm\{TV\}\}\\,\\mathrm\{TV\}\(\\rho\_\{\\theta\}\)\\;\+\\;\\lambda\_\{N\}\\,\\mathcal\{R\}\_\{\\mathrm\{nm\}\}\\;\+\\;\\lambda\_\{\\rho\}\\,\\mathcal\{R\}\_\{\\mathrm\{ds\}\},\(6\)whereℛnm\\mathcal\{R\}\_\{\\mathrm\{nm\}\}is a mean\-normalized noise\-map loss \(a scale\-stable companion to𝒟\\mathcal\{D\}that anchors gradients on the support; full form in Appendix[D\.4](https://arxiv.org/html/2605.13988#A4.SS4)\) andℛds\\mathcal\{R\}\_\{\\mathrm\{ds\}\}is a direct\-density loss matchingρθ2\\rho\_\{\\theta\}^\{2\}to the mean\-normalized observed noise map \(a soft, oracle\-free support proxy whose theoretical scope is discussed in Appendix[D\.4](https://arxiv.org/html/2605.13988#A4.SS4)\)\. The five loss\-term weights are fixed across all reported runs \(Appendix[D\.4](https://arxiv.org/html/2605.13988#A4.SS4), Table[7](https://arxiv.org/html/2605.13988#A4.T7)\); the relative weight of𝒟\\mathcal\{D\}vs\.ℛnm\\mathcal\{R\}\_\{\\mathrm\{nm\}\}is rebalanced between Stage 1 and Stage 2 to reflect the different roles of log\-MSE support discovery \(coarse stage\) and mean\-normalized refinement \(fine stage\), as detailed in Appendix[D\.4](https://arxiv.org/html/2605.13988#A4.SS4)\. Identifiability ofωL\\omega\_\{L\}on the support ofρ\\rho\(P4\) is enforced architecturally: the Larmor head is multiplied by the hard support mask𝟏​\{ρθ​\(𝐫\)\>τ​max𝐫′⁡ρθ​\(𝐫′\)\}\\mathbf\{1\}\\\{\\rho\_\{\\theta\}\(\\mathbf\{r\}\)\>\\tau\\,\\max\_\{\\mathbf\{r\}^\{\\prime\}\}\\rho\_\{\\theta\}\(\\mathbf\{r\}^\{\\prime\}\)\\\}, with the mask treated as a stop\-gradient constant during back\-prop, so the nondifferentiable indicator is not differentiated and the Larmor logit receives gradients only on predicted support \(Appendix[D\.1](https://arxiv.org/html/2605.13988#A4.SS1), Eq\. \([26](https://arxiv.org/html/2605.13988#A4.E26)\)\)\. The loss weightsλ1,λTV≥0\\lambda\_\{1\},\\lambda\_\{\\mathrm\{TV\}\}\\geq 0are the standardℓ1\\ell\_\{1\}and TV weights of Eq\. \([3](https://arxiv.org/html/2605.13988#S3.E3)\), andλN,λρ≥0\\lambda\_\{N\},\\lambda\_\{\\rho\}\\geq 0are the NeTMY\-specific weights forℛnm,ℛds\\mathcal\{R\}\_\{\\mathrm\{nm\}\},\\mathcal\{R\}\_\{\\mathrm\{ds\}\}respectively\.

Energy\-anchored scale correction\.The pre\-normalization scale ofρ\\rhois unidentifiable fromN^\\widehat\{N\}\(Section[3\.2](https://arxiv.org/html/2605.13988#S3.SS2); Proposition[1](https://arxiv.org/html/2605.13988#Thmproposition1)\)\. After the per\-measurement optimization terminates, NeTMY rescales the predicted density by the energy ratioρ^←α​ρ^\\widehat\{\\rho\}\\leftarrow\\alpha\\widehat\{\\rho\},α=Eobs/Epred\\alpha=E\_\{\\mathrm\{obs\}\}/E\_\{\\mathrm\{pred\}\}forℱ2\\mathcal\{F\}\_\{2\}, whereEobs=∑ω,𝐫SobsE\_\{\\mathrm\{obs\}\}=\\sum\_\{\\omega,\\mathbf\{r\}\}S\_\{\\mathrm\{obs\}\}andEpred=∑ω,𝐫ℱ​\(ρ^,ωL,θ\)E\_\{\\mathrm\{pred\}\}=\\sum\_\{\\omega,\\mathbf\{r\}\}\\mathcal\{F\}\(\\widehat\{\\rho\},\\omega\_\{L,\\theta\}\)\. We use the operator\-matched convention in the main\-body tables; This correction mainly affects density MSE; the primary localization\-oriented metrics are insensitive to the absolute density scale\. The physically matched closed\-form correction for the quadratic operatorℱ1\\mathcal\{F\}\_\{1\}isEobs/Epred\\sqrt\{E\_\{\\mathrm\{obs\}\}/E\_\{\\mathrm\{pred\}\}\}and is derived alongside the linear\-ℱ2\\mathcal\{F\}\_\{2\}form in Appendix[B](https://arxiv.org/html/2605.13988#A2)\(Eq\. \([21](https://arxiv.org/html/2605.13988#A2.E21)\)\)\. Operating this as a one\-shot post\-correction rather than a soft penalty inside Eq\. \([6](https://arxiv.org/html/2605.13988#S4.E6)\) avoids reweighting the data\-fidelity gradient and gives identical post\-processing across baselines \(Appendix[D\.5](https://arxiv.org/html/2605.13988#A4.SS5)\)\.

### 4\.4Optimization Geometry: a Filtering View of the Update

Free\-density solvers \(Tikhonov, ADMM\) minimize Eq\. \([3](https://arxiv.org/html/2605.13988#S3.E3)\) by steppingρ\\rhodirectly,ρt\+1=ρt−η​∇ρℒ​\(ρt\)\\rho\_\{t\+1\}=\\rho\_\{t\}\-\\eta\\,\\nabla\_\{\\rho\}\\mathcal\{L\}\(\\rho\_\{t\}\), and therefore execute the raw density\-space gradient that inherits the \(P2\) center bias and the \(P3\) max\-normalization peak coupling\. NeTMY instead steps the parametersθ\\theta, with chain rule

Δ​ρ≈Jθ​Δ​θ=−η​Jθ​Jθ⊤​∇ρℒ=−η​Gθ​∇ρℒ,Jθ=∂fθ/∂θ,\\Delta\\rho\\;\\approx\\;J\_\{\\theta\}\\,\\Delta\\theta\\;=\\;\-\\eta\\,J\_\{\\theta\}J\_\{\\theta\}^\{\\top\}\\,\\nabla\_\{\\rho\}\\mathcal\{L\}\\;=\\;\-\\eta\\,G\_\{\\theta\}\\,\\nabla\_\{\\rho\}\\mathcal\{L\},\\qquad J\_\{\\theta\}\\;=\\;\\partial f\_\{\\theta\}/\\partial\\theta,\(7\)so the realized image\-space update is the raw density gradient*filtered*by the positive semidefinite kernelGθ=Jθ​Jθ⊤∈ℝ\|Ω\|×\|Ω\|G\_\{\\theta\}=J\_\{\\theta\}J\_\{\\theta\}^\{\\top\}\\in\\mathbb\{R\}^\{\|\\Omega\|\\times\|\\Omega\|\}induced by the parameterization \(Appendix[D\.6](https://arxiv.org/html/2605.13988#A4.SS6)\)\. For a coordinate MLP with annealed Fourier features,GθG\_\{\\theta\}is a smoothing, structurally coupled operator across pixels: a sharp center spike in∇ρℒ\\nabla\_\{\\rho\}\\mathcal\{L\}is redistributed byGθG\_\{\\theta\}rather than executed as a singular update, which is precisely the geometry under which the iter\-0centered gradient documented in \(P2\) does not need to be amplified by \(P3\)\. This view connects the architectural choices of Sections[4\.1](https://arxiv.org/html/2605.13988#S4.SS1)–[4\.2](https://arxiv.org/html/2605.13988#S4.SS2)to the empirical center\-collapse / energy\-barrier mechanism diagnosis in Section[5](https://arxiv.org/html/2605.13988#S5): NeTMY does not eliminate the pathological gradient but bends its realization in image space\.

A complete training algorithm, the explicit functional forms ofℛnm\\mathcal\{R\}\_\{\\mathrm\{nm\}\}andℛds\\mathcal\{R\}\_\{\\mathrm\{ds\}\}, the architecture and PE\-annealing schedule, and the formal version of Eq\. \([7](https://arxiv.org/html/2605.13988#S4.E7)\) are given in Appendix[D](https://arxiv.org/html/2605.13988#A4)\.

## 5Experiments

We evaluate NeTMY against three families of inverse solvers under the formulation of Eq\. \([3](https://arxiv.org/html/2605.13988#S3.E3)\) and the ill\-posedness pathologies of Section[3\.3](https://arxiv.org/html/2605.13988#S3.SS3), organized around four questions: \(i\) does the choice of inversion operatorℱ∈\{ℱ1,ℱ2\}\\mathcal\{F\}\\in\\\{\\mathcal\{F\}\_\{1\},\\mathcal\{F\}\_\{2\}\\\}change which methods recover sparse spin sources reliably \(Section[5\.2](https://arxiv.org/html/2605.13988#S5.SS2)\); \(ii\) what is the optimization\-geometry mechanism behind the ranking \(Section[5\.3](https://arxiv.org/html/2605.13988#S5.SS3)\); \(iii\) which components of the NeTMY design carry the gain \(Section[5\.4](https://arxiv.org/html/2605.13988#S5.SS4)\); and \(iv\) does the operator\-fidelity gap survive on a real NV\-relaxometry dataset where the ground\-truth spin source is independently constrained \(Section[5\.5](https://arxiv.org/html/2605.13988#S5.SS5)\)\. Implementation details, full per\-method per\-sample results, hyperparameter sweeps, the supervised\-baseline distribution\-shift study, and runtime statistics are in Appendix[E](https://arxiv.org/html/2605.13988#A5)\.

### 5\.1Setup

Datasets\.Two regimes in the main text \(full construction in Appendix[E\.1](https://arxiv.org/html/2605.13988#A5.SS1)\)\. \(i\) A*cross\-fidelity benchmark*of512512synthetic samples in which spectra are generated by the source\-side direct simulatorℱ3\\mathcal\{F\}\_\{3\}\(Eq\. \([11](https://arxiv.org/html/2605.13988#A1.E11)\)\) and methods invert under eitherℱ1\\mathcal\{F\}\_\{1\}orℱ2\\mathcal\{F\}\_\{2\}, avoiding the inverse\-crime risk of reusing the inversion operator as the data\-generation operator\[[36](https://arxiv.org/html/2605.13988#bib.bib3)\]\. The samples cover eight scene classes \(few/medium/many sources×\\timesclose/medium/far minimum separations relative to the standoffz0z\_\{0\}\)\. \(ii\) A*real\-data cross\-check*onα\\alpha\-RuCl3\(8 NVs,Kumaret al\.\[[41](https://arxiv.org/html/2605.13988#bib.bib27)\]\) with an independent SRIM depth prior14±514\\pm 5nm\. Two complementary matched\-operator benchmarks \(data generation and inversion by the sameℱ1\\mathcal\{F\}\_\{1\}orℱ2\\mathcal\{F\}\_\{2\},512512samples each, with broader baseline coverage and density MSE only\) are reported alongside in Table[3](https://arxiv.org/html/2605.13988#S5.T3)and in Appendix[E\.4](https://arxiv.org/html/2605.13988#A5.SS4)\.

Baselines\.Four reconstruction families that share Eq\. \([3](https://arxiv.org/html/2605.13988#S3.E3)\) but differ in howρ\\rhois parameterized: Tikhonov \(free\-density gradient descent withℓ2\\ell\_\{2\}/TV\)\[[32](https://arxiv.org/html/2605.13988#bib.bib12),[67](https://arxiv.org/html/2605.13988#bib.bib21)\]; ADMM \(variable\-splitting with augmented Lagrangian and box constraints\)\[[56](https://arxiv.org/html/2605.13988#bib.bib14)\]; GaussianSplat \(explicit Gaussian\-splat primitives with prune/split/clone/merge\)\[[38](https://arxiv.org/html/2605.13988#bib.bib64)\]; and NeTMY \(this work\)\. L\-BFGS\[[45](https://arxiv.org/html/2605.13988#bib.bib82)\]is included only in the mechanism analysis\. Untrained\-prior \(DeepDecoder\[[28](https://arxiv.org/html/2605.13988#bib.bib18)\]\) and supervised baselines \(U\-Net\[[65](https://arxiv.org/html/2605.13988#bib.bib15)\], GAN\[[22](https://arxiv.org/html/2605.13988#bib.bib16)\], HybridNeTMY\) are reported in Appendix[E\.7](https://arxiv.org/html/2605.13988#A5.SS7)\.

Metrics\.Three primary metrics that emphasize sparse\-localization and distributional fidelity over pixelwise mean\-square error: GMSD \(gradient\-magnitude similarity deviation\)\[[80](https://arxiv.org/html/2605.13988#bib.bib79)\]; Hungarian F1 \(one\-to\-one matched localization at fixed radius\)\[[40](https://arxiv.org/html/2605.13988#bib.bib80)\]; and Sliced Wasserstein distance \(SWD\) over128128random projections\[[7](https://arxiv.org/html/2605.13988#bib.bib81)\]\. Pixelwise density MSE is a secondary metric included for comparability with prior NV\-relaxometry work\[[51](https://arxiv.org/html/2605.13988#bib.bib23),[41](https://arxiv.org/html/2605.13988#bib.bib27)\]; on sparse scenes, it is dominated by background and undersells localization quality\. Definitions and matching tolerances are in Appendix[E\.3](https://arxiv.org/html/2605.13988#A5.SS3)\.

### 5\.2Operator Fidelity Reshapes the Reconstruction Benchmark

Tables[1](https://arxiv.org/html/2605.13988#S5.T1)and[3](https://arxiv.org/html/2605.13988#S5.T3)report the cross\-fidelity and matched\-operator benchmarks; we highlight 3 observations\.

Table 1:Cross\-fidelity benchmark onℱ3\\mathcal\{F\}\_\{3\}\-generated samples; methods invert underℱ1\\mathcal\{F\}\_\{1\}orℱ2\\mathcal\{F\}\_\{2\}\. Best per metric in bold\. We report the mean and 95% CI across 3 seeds\.Inversion underℱ1\\mathcal\{F\}\_\{1\}\(scalar\)Inversion underℱ2\\mathcal\{F\}\_\{2\}\(tensor\)MethodGMSD↓\\downarrowF1↑\\uparrowSWD↓\\downarrowMSE↓\\downarrowGMSD↓\\downarrowF1↑\\uparrowSWD↓\\downarrowMSE↓\\downarrowClassical regularized solversTikhonov0\.178±0\.0410\.178\{\\pm\}0\.0410\.465±0\.4360\.465\{\\pm\}0\.4360\.119±0\.0430\.119\{\\pm\}0\.0430\.0119±0\.01550\.0119\{\\pm\}0\.01550\.202±0\.0430\.202\{\\pm\}0\.0430\.762±0\.3180\.762\{\\pm\}0\.3180\.100±0\.0410\.100\{\\pm\}0\.0410\.0082±0\.01180\.0082\{\\pm\}0\.0118ADMM0\.235±0\.0880\.235\{\\pm\}0\.0880\.267±0\.3890\.267\{\\pm\}0\.3890\.052±0\.0280\.052\{\\pm\}0\.0280\.0183±0\.01770\.0183\{\\pm\}0\.01770\.197±0\.0610\.197\{\\pm\}0\.0610\.084±0\.2080\.084\{\\pm\}0\.2080\.134±0\.0480\.134\{\\pm\}0\.0483\.8364±4\.97263\.8364\{\\pm\}4\.9726Parameterized solversGaussianSplat0\.144±0\.082\\mathbf\{0\.144\{\\pm\}0\.082\}0\.816±0\.213\\mathbf\{0\.816\{\\pm\}0\.213\}0\.044±0\.037\\mathbf\{0\.044\{\\pm\}0\.037\}0\.0076±0\.01220\.0076\{\\pm\}0\.01220\.140±0\.0700\.140\{\\pm\}0\.0700\.740±0\.2410\.740\{\\pm\}0\.2410\.061±0\.0390\.061\{\\pm\}0\.0390\.0117±0\.01710\.0117\{\\pm\}0\.0171NeTMY \(Ours\)0\.189±0\.1160\.189\{\\pm\}0\.1160\.733±0\.3340\.733\{\\pm\}0\.3340\.051±0\.0480\.051\{\\pm\}0\.0480\.0066±0\.0080\\mathbf\{0\.0066\{\\pm\}0\.0080\}0\.125±0\.058\\mathbf\{0\.125\{\\pm\}0\.058\}0\.882±0\.102\\mathbf\{0\.882\{\\pm\}0\.102\}0\.036±0\.023\\mathbf\{0\.036\{\\pm\}0\.023\}0\.0078±0\.0082\\mathbf\{0\.0078\{\\pm\}0\.0082\}

Table 2:Matched\-operator benchmarks\. The matched operator regime is operator\-dependent\.ℱ1/ℱ1\\mathcal\{F\}\_\{1\}/\\mathcal\{F\}\_\{1\}ℱ2/ℱ2\\mathcal\{F\}\_\{2\}/\\mathcal\{F\}\_\{2\}MethodMSE↓\\downarrowNoiseMSE↓\\downarrowMSE↓\\downarrowNoiseMSE↓\\downarrowTikhonov0\.0152\.250\.0120\.81ADMM0\.0060\.320\.25622\.9GaussianSplat0\.0100\.350\.0071\.09HybridNeTMY0\.0256\.471\.0110\.23DeepDecoder1\.6131\.380\.25624\.3UntrainedGAN0\.02643\.00\.02624\.2NeTMY \(Ours\)0\.0091\.150\.0040\.21
Table 3:Cumulative ablation\. Each row removes one component cumulatively\.ConfigurationMSE↓\\downarrowSSIM↑\\uparrowFull model\.0039\.913−\-TV\.0042\.893−\-annealed PE\.0086\.897−\-PE\.0091\.895−\-multiscale\.0103\.890−ℓ1\-\\ell\_\{1\}\.0104\.880−\-gate\.0118\.876−ℛds\-\\mathcal\{R\}\_\{\\mathrm\{ds\}\}\.0116\.890

\(i\) The physics\-corrected operator changes the performance of the methods\.Moving fromℱ1\\mathcal\{F\}\_\{1\}toℱ2\\mathcal\{F\}\_\{2\}inversion lifts Hungarian F1 and lowers SWD for two of four; the largest gain is NeTMY \(Hungarian\-F1:0\.733→0\.8820\.733\\\!\\to\\\!0\.882; SWD:0\.051→0\.0360\.051\\\!\\to\\\!0\.036\)\. ADMM’s density\-MSE*significantly worsens*underℱ2\\mathcal\{F\}\_\{2\}, because the more faithful operator sharpens the centered collapse of Section[5\.3](https://arxiv.org/html/2605.13988#S5.SS3)\.

\(ii\) Underℱ2\\mathcal\{F\}\_\{2\}, NeTMY tops every primary metric\.NeTMY beats GaussianSplat by19%19\\%on F1 and41%41\\%on SWD \(Table[1](https://arxiv.org/html/2605.13988#S5.T1)\); the third\-decimal density\-MSE gap reflects MSE being background\-dominated on sparse scenes, while the localization\-aware metrics measure what the design targets\.

\(iii\) The matched\-operator regime is operator\-dependent\.On the matched\-operator benchmark \(Table[3](https://arxiv.org/html/2605.13988#S5.T3)\), ADMM is the lowest\-MSE method underℱ1/ℱ1\\mathcal\{F\}\_\{1\}/\\mathcal\{F\}\_\{1\}\(no centering pathology\) but degrades by two orders of magnitude underℱ2/ℱ2\\mathcal\{F\}\_\{2\}/\\mathcal\{F\}\_\{2\}, while NeTMY is the lowest\-MSE method on allℱ2/ℱ2\\mathcal\{F\}\_\{2\}/\\mathcal\{F\}\_\{2\}metrics; the fullℱ2/ℱ2\\mathcal\{F\}\_\{2\}/\\mathcal\{F\}\_\{2\}leaderboard with secondary metrics and runtime is in Appendix[E\.4](https://arxiv.org/html/2605.13988#A5.SS4)\.

![Refer to caption](https://arxiv.org/html/2605.13988v1/x1.png)Figure 3:Qualitative reconstructions on twoℱ3\\mathcal\{F\}\_\{3\}\-generated samples inverted underℱ2\\mathcal\{F\}\_\{2\}\. Free\-density solvers center\-collapse; HybridNeTMY and NeTMY preserve off\-center structure\.Figure[3](https://arxiv.org/html/2605.13988#S5.F3)illustrates the pixel\-level pattern: free\-density solvers execute the centered iter\-0 gradient of \(P2\)–\(P3\) and end at a centrally collapsed minimum, while NeTMY preserves off\-center structure and reconstructs finer details\. The matched\-operator results in Table[3](https://arxiv.org/html/2605.13988#S5.T3)reuse the inversion operator as the data\-generation operator and are therefore subject to the inverse\-crime caveat\[[36](https://arxiv.org/html/2605.13988#bib.bib3)\]\. More qualitative examples are in Appendix[E\.11](https://arxiv.org/html/2605.13988#A5.SS11)\. We use Table[1](https://arxiv.org/html/2605.13988#S5.T1)as primary evidence and the real\-data check \(Section[5\.5](https://arxiv.org/html/2605.13988#S5.SS5)\) as a simulator\-free test\.

### 5\.3Optimization Geometry: Why Free\-Density Solvers Center\-Collapse at an Early Stage

Lemma[2](https://arxiv.org/html/2605.13988#Thmlemma2)predicts that the iter\-0centered gradient produced by \(P2\)–\(P3\) is executed verbatim by free\-density solvers and filtered by the parameterization Jacobian for NeTMY\. We measure three quantities on representative many\-source samples \(Figure[4](https://arxiv.org/html/2605.13988#S5.F4)\)\.

![Refer to caption](https://arxiv.org/html/2605.13988v1/x2.png)Figure 4:Optimization\-geometry diagnostics\. \(a\) Epoch\-200 reconstructions\. \(b\) Loss alongρ​\(t\)=\(1−t\)​ρcollapse\+t​ρ⋆\\rho\(t\)=\(1\-t\)\\rho\_\{\\mathrm\{collapse\}\}\+t\\,\\rho\_\{\\star\}\. \(c\) Center\-mass ratio at epoch 200\.Iter\-0 gradient\.On a uniform initialization, theℱ2\\mathcal\{F\}\_\{2\}data\-fidelity gradient has a center\-to\-outer\-ring magnitude ratio of18\.29×18\.29\\times, with the peak at the grid center and not at any source: the \(P2\) signature predicted by Eq\. \([23](https://arxiv.org/html/2605.13988#A3.E23)\), executed verbatim by free\-density solvers\.

Energy barrier\.The path from a centered\-collapse iterate to the ground truth \(Figure[4](https://arxiv.org/html/2605.13988#S5.F4)b\) crosses anh≈1\.12h\\\!\\approx\\\!1\.12barrier underℱ2\\mathcal\{F\}\_\{2\}att=0\.20t\\\!=\\\!0\.20but is monotonically decreasing underℱ1\\mathcal\{F\}\_\{1\}: a free\-density solver in the centered minimum is hard to escape by gradient descent\.

Fixed budget\.At a200200\-iteration budget \(Figure[4](https://arxiv.org/html/2605.13988#S5.F4)c\), center\-mass ratios order as NeTMY \(0\.000\.00\)<<GaussianSplat \(0\.0640\.064\)<<L\-BFGS \(0\.0810\.081\)<<ADMM \(0\.1530\.153\)<<Tikhonov \(0\.2230\.223\)\. For the parameterized solvers \(NeTMY, GaussianSplat\), the ranking tracks the effective smoothness ofGθ=Jθ​Jθ⊤G\_\{\\theta\}=J\_\{\\theta\}J\_\{\\theta\}^\{\\top\}\. L\-BFGS is a free\-density solver and uses a quasi\-Newton inverse\-Hessian approximation, but still empirically suffers from the centered\-collapse basin\. The realized NeTMY iter\-0 update has no singular center spike \(App\.[E\.5](https://arxiv.org/html/2605.13988#A5.SS5), Fig\.[10](https://arxiv.org/html/2605.13988#A5.F10)\); a 32\-run ADMM sweep confirms the operator effect \(mean center\-mass ratio0\.55→0\.780\.55\\\!\\to\\\!0\.78fromℱ1\\mathcal\{F\}\_\{1\}toℱ2\\mathcal\{F\}\_\{2\}; App\.[E\.5](https://arxiv.org/html/2605.13988#A5.SS5)\)\.

### 5\.4Component Ablation

We ablate seven design choices on the full dataset \(Table[3](https://arxiv.org/html/2605.13988#S5.T3)\)\. Cumulatively removing \{annealing schedule, PE, multiscale,ℓ1\\ell\_\{1\}, gate,ℛds\\mathcal\{R\}\_\{\\mathrm\{ds\}\}\} degrades density MSE by22–3×3\\times\. The ordering tracks the ill\-posedness mechanisms that motivated each component: annealed PE and PE address \(P1\), multiscale and gate address \(P1\) and \(P3\), and the direct\-density loss provides the missing\-amplitude proxy\. TV has a small aggregate effect \(×1\.1\\times 1\.1\) but qualitatively controls cross\-shaped artifacts on dense scenes\. Per\-class and hyperparameter sweeps \(LR, hidden width, PE octaves\) are in Appendix[E\.6](https://arxiv.org/html/2605.13988#A5.SS6)\.

### 5\.5Real\-Data Cross\-Check onα\\alpha\-RuCl3

We test the operator\-fidelity gap on theα\\alpha\-RuCl3dataset ofKumaret al\.\[[41](https://arxiv.org/html/2605.13988#bib.bib27)\]\(88NVs, SRIM depth prior14±514\\pm 5nm\), along two independent axes under each operator\. We frame the result as a multi\-axis consistency check, not a validation; the full four\-step protocol \(the two reported axes, an operator\-neutral spectral calibration, and an additional power\-law slope axis which is ambiguous on its own\) is in Appendix[E\.8](https://arxiv.org/html/2605.13988#A5.SS8)\.

![Refer to caption](https://arxiv.org/html/2605.13988v1/figures/sec5/realdata_compact.png)Figure 5:Loss landscape onα\\alpha\-RuCl3:ℱ2\\mathcal\{F\}\_\{2\}\(left\) is well\-conditioned,ℱ1\\mathcal\{F\}\_\{1\}\(right\) is a degenerate valley\.\(i\) Depth\-amplitude\.SolvingΓRupred​\(d\)=ΓRumeas\\Gamma\_\{\\mathrm\{Ru\}\}^\{\\mathrm\{pred\}\}\(d\)=\\Gamma\_\{\\mathrm\{Ru\}\}^\{\\mathrm\{meas\}\}for the implied NV depthd⋆d^\{\\star\}at fixed crystallographic density and moment puts1/81/8NVs in\[9,20\]\[9,20\]nm underℱ2\\mathcal\{F\}\_\{2\}vs0/80/8underℱ1\\mathcal\{F\}\_\{1\}, with point amplitude ratioΓℱ1/Γℱ2≈210×\\Gamma^\{\\mathcal\{F\}\_\{1\}\}\\\!/\\Gamma^\{\\mathcal\{F\}\_\{2\}\}\\\!\\approx 210\\timesatd=14\.5d=14\.5nm; theℱ1\\mathcal\{F\}\_\{1\}inversion has no real solution for77of the88NVs\.

\(ii\) Hessian conditioning\.On the Gaussian\-density ansatzρ​\(𝐫;A,σg\)=A​exp⁡\(−‖𝐫‖2/2​σg2\)\\rho\(\\mathbf\{r\};A,\\sigma\_\{g\}\)=A\\exp\(\-\\left\\lVert\\mathbf\{r\}\\right\\rVert^\{2\}/2\\sigma\_\{g\}^\{2\}\), whereσg\\sigma\_\{g\}denotes the Gaussian width, over the88NVs \(Figure[5](https://arxiv.org/html/2605.13988#S5.F5)\), the log\-MSE condition numbers areκℱ2=931\\kappa\_\{\\mathcal\{F\}\_\{2\}\}=931andκℱ1=301,139\\kappa\_\{\\mathcal\{F\}\_\{1\}\}=301\{,\}139\(ratio∼323×\\sim\\\!323\\times\): theℱ2\\mathcal\{F\}\_\{2\}landscape is a parabolic bowl whileℱ1\\mathcal\{F\}\_\{1\}is a degenerate valley alongA2​σg2≈constA^\{2\}\\sigma\_\{g\}^\{2\}\\\!\\approx\\\!\\mathrm\{const\}, a direct algebraic consequence ofΓ∝A2\\Gamma\\propto A^\{2\}inℱ1\\mathcal\{F\}\_\{1\}\.

Both reported axes \(amplitude, conditioning\) disfavorℱ1\\mathcal\{F\}\_\{1\}\. The multi\-axis evidence is consistent with §[5\.2](https://arxiv.org/html/2605.13988#S5.SS2)–[5\.3](https://arxiv.org/html/2605.13988#S5.SS3)\. A complementary, non\-NeTMY\-specific consistency check that reproduces the dipolar1/T1∝σb/D41/T\_\{1\}\\propto\\sigma\_\{b\}/D^\{4\}scaling under the forward operator is in App\.[A\.9](https://arxiv.org/html/2605.13988#A1.SS9)\. A dense\-to\-sparse distribution\-shift comparison and runtime are reported in App\.[E](https://arxiv.org/html/2605.13988#A5)\.

## 6Conclusion

We framed NV\-noise sensing inversion as a physics\-faithful, ill\-posed inverse problem in which forward\-operator choice reshapes both the measurement distribution and the optimization geometry\. Moving from a scalar/coherent approximationℱ1\\mathcal\{F\}\_\{1\}to a tensor power\-summed dipolar operatorℱ2\\mathcal\{F\}\_\{2\}exposes a center\-collapse pathology in free\-density solvers, traced to a centered iter\-0gradient and a max\-normalization peak coupling\. We proposed NeTMY, an amortization\-free coordinate neural\-field solver whose parameterization implements a positive semidefinite filter on the raw density\-space gradient and, combined with annealed positional encoding, a multiscale curriculum, density gating, and energy\-anchored scale correction, attains the best localization and distributional metrics on the cross\-fidelity benchmark; an ablation isolates each component, and a multi\-axis check onα\\alpha\-RuCl3\[[41](https://arxiv.org/html/2605.13988#bib.bib27)\]disfavorsℱ1\\mathcal\{F\}\_\{1\}along amplitude and conditioning axes\. More broadly, the operator\-aware ranking, the gradient\-filtering view, and the cross\-fidelity protocol position NV relaxometry as a testbed for physics\-faithful neural inverse problems\.

Limitations and Broader Impact\.NeTMY is per\-measurement and trades wall\-clock cost \(roughly100×100\\timesslower than classical baselines, App\.[E\.9](https://arxiv.org/html/2605.13988#A5.SS9)\) for label\-free generalization\. However, since the main goal of this method is to achieve a more precise reconstruction under the challenge of the lack of labeled data, this slowdown is not expected to dominate its use in research\-grade reconstruction\. Thus, it is acceptable for research\-grade reconstruction; high\-throughput deployment would benefit from amortization or warm\-starting\. The real\-data check is a multi\-axis consistency test on88NVs under a Gaussian ansatz rather than a full validation, and we keep this scope explicit\. The analysis is scoped to the fluctuation\-dominated dipolar regime; coherent\-coupling regimes and other quantum\-sensing modalities are left to future work\. NV noise sensing has numerous applications in materials science and biology, where better sparse reconstructions accelerate the study of spin\-fluctuation phenomena and nanoscale magnetic textures, and we do not anticipate direct dual\-use harms\. The principal negative pathway is misplaced confidence in reconstructions outside the operator’s modeled regime; we recommend reporting operator\-fidelity caveats and consistency checks alongside downstream claims\.

## References

- \[1\]N\. Aharon, A\. Rotem, L\. P\. McGuinness, F\. Jelezko, A\. Retzker, and Z\. Ringel\(2019\)NV center based nano\-nmr enhanced by deep learning\.Scientific reports9\(1\),pp\. 17802\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p1.2)\.
- \[2\]A\. Ahmed, B\. Recht, and J\. Romberg\(2013\)Blind deconvolution using convex programming\.IEEE Transactions on Information Theory60\(3\),pp\. 1711–1732\.Cited by:[§B\.2](https://arxiv.org/html/2605.13988#A2.SS2.p1.8),[§2](https://arxiv.org/html/2605.13988#S2.p2.1),[§3\.2](https://arxiv.org/html/2605.13988#S3.SS2.p2.6)\.
- \[3\]I\. Alkhouri, E\. Bell, A\. Ghosh, S\. Liang, R\. Wang, and S\. Ravishankar\(2025\)Understanding untrained deep models for inverse problems: algorithms and theory\.arXiv preprint arXiv:2502\.18612\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[4\]V\. Antun, F\. Renna, C\. Poon, B\. Adcock, and A\. C\. Hansen\(2020\)On instabilities of deep learning in image reconstruction and the potential costs of ai\.Proceedings of the National Academy of Sciences117\(48\),pp\. 30088–30095\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[5\]S\. Arridge, P\. Maass, O\. Öktem, and C\. Schönlieb\(2019\)Solving inverse problems using data\-driven models\.Acta numerica28,pp\. 1–174\.Cited by:[§1](https://arxiv.org/html/2605.13988#S1.p1.1),[§2](https://arxiv.org/html/2605.13988#S2.p2.1)\.
- \[6\]J\. L\. Ávila\-Jiménez, A\. Bülter, L\. Horsthemke, F\. J\. Rodriguez\-Lozano, M\. Ortiz\-Lopez, and P\. Glösekötter\(2025\)Enhancing all\-optical nv center magnetometry with machine learning: model selection for efficient and deployable quantum sensors\.IEEE Sensors Journal25\(24\),pp\. 44473–44481\.External Links:[Document](https://dx.doi.org/10.1109/JSEN.2025.3628661)Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p1.2)\.
- \[7\]N\. Bonneel, J\. Rabin, G\. Peyré, and H\. Pfister\(2015\)Sliced and radon wasserstein barycenters of measures\.Journal of Mathematical Imaging and Vision51\(1\),pp\. 22–45\.Cited by:[§E\.3](https://arxiv.org/html/2605.13988#A5.SS3.p3.4),[§5\.1](https://arxiv.org/html/2605.13988#S5.SS1.p3.1)\.
- \[8\]A\. Bora, A\. Jalal, E\. Price, and A\. G\. Dimakis\(2017\)Compressed sensing using generative models\.InInternational conference on machine learning,pp\. 537–546\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p2.1)\.
- \[9\]C\. Bunks, F\. M\. Saleck, S\. Zaleski, and G\. Chavent\(1995\)Multiscale seismic waveform inversion\.Geophysics60\(5\),pp\. 1457–1473\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p2.1)\.
- \[10\]E\. J\. Candès, J\. Romberg, and T\. Tao\(2006\)Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information\.IEEE Transactions on information theory52\(2\),pp\. 489–509\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p2.1)\.
- \[11\]F\. Casola, T\. Van Der Sar, and A\. Yacoby\(2018\)Probing condensed matter physics with magnetometry based on nitrogen\-vacancy centres in diamond\.Nature Reviews Materials3\(1\),pp\. 17088\.Cited by:[§A\.1](https://arxiv.org/html/2605.13988#A1.SS1.p1.19),[§A\.6](https://arxiv.org/html/2605.13988#A1.SS6.p1.4),[§B\.1](https://arxiv.org/html/2605.13988#A2.SS1.p1.6),[§1](https://arxiv.org/html/2605.13988#S1.p2.1),[§2](https://arxiv.org/html/2605.13988#S2.p1.2),[§3\.1](https://arxiv.org/html/2605.13988#S3.SS1.p1.17),[§3\.2](https://arxiv.org/html/2605.13988#S3.SS2.p1.6)\.
- \[12\]A\. Chambolle and T\. Pock\(2011\)A first\-order primal\-dual algorithm for convex problems with applications to imaging\.Journal of mathematical imaging and vision40\(1\),pp\. 120–145\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p2.1)\.
- \[13\]K\. Cheng, Z\. Kazi, J\. Rovny, B\. Zhang, L\. S\. Nassar, J\. D\. Thompson, and N\. P\. De Leon\(2025\)Massively multiplexed nanoscale magnetometry with diamond quantum sensors\.Physical Review X15\(3\),pp\. 031014\.Cited by:[§1](https://arxiv.org/html/2605.13988#S1.p2.1)\.
- \[14\]Y\. Chi, Y\. M\. Lu, and Y\. Chen\(2019\)Nonconvex optimization meets low\-rank matrix factorization: an overview\.IEEE Transactions on Signal Processing67\(20\),pp\. 5239–5269\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p2.1)\.
- \[15\]S\. Cuomo, V\. S\. Di Cola, F\. Giampaolo, G\. Rozza, M\. Raissi, and F\. Piccialli\(2022\)Scientific machine learning through physics–informed neural networks: where we are and what’s next\.Journal of Scientific Computing92\(3\),pp\. 88\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[16\]C\. L\. Degen, F\. Reinhard, and P\. Cappellaro\(2017\)Quantum sensing\.Reviews of modern physics89\(3\),pp\. 035002\.Cited by:[§1](https://arxiv.org/html/2605.13988#S1.p2.1),[§2](https://arxiv.org/html/2605.13988#S2.p1.2)\.
- \[17\]M\. Dehghani, Y\. Tay, A\. A\. Gritsenko, Z\. Zhao, N\. Houlsby, F\. Diaz, D\. Metzler, and O\. Vinyals\(2021\)The benchmark lottery\.arXiv preprint arXiv:2107\.07002\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[18\]M\. W\. Doherty, N\. B\. Manson, P\. Delaney, F\. Jelezko, J\. Wrachtrup, and L\. C\. Hollenberg\(2013\)The nitrogen\-vacancy colour centre in diamond\.Physics Reports528\(1\),pp\. 1–45\.Cited by:[§A\.1](https://arxiv.org/html/2605.13988#A1.SS1.p1.19),[§1](https://arxiv.org/html/2605.13988#S1.p2.1),[§2](https://arxiv.org/html/2605.13988#S2.p1.2),[§3\.1](https://arxiv.org/html/2605.13988#S3.SS1.p1.17)\.
- \[19\]D\. L\. Donoho\(2006\)Compressed sensing\.IEEE Transactions on information theory52\(4\),pp\. 1289–1306\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p2.1)\.
- \[20\]P\. Fuchs and K\. Shmueli\(2023\)Incomplete spectrum qsm using support information\.Frontiers in Neuroscience17,pp\. 1130524\.Cited by:[§1](https://arxiv.org/html/2605.13988#S1.p2.1)\.
- \[21\]D\. Gilton, G\. Ongie, and R\. Willett\(2021\)Deep equilibrium architectures for inverse problems in imaging\.IEEE Transactions on Computational Imaging7,pp\. 1123–1133\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[22\]I\. J\. Goodfellow, J\. Pouget\-Abadie, M\. Mirza, B\. Xu, D\. Warde\-Farley, S\. Ozair, A\. Courville, and Y\. Bengio\(2014\)Generative adversarial nets\.Advances in neural information processing systems27\.Cited by:[§1](https://arxiv.org/html/2605.13988#S1.p3.1),[§5\.1](https://arxiv.org/html/2605.13988#S5.SS1.p2.2)\.
- \[23\]R\. Grech, T\. Cassar, J\. Muscat, K\. P\. Camilleri, S\. G\. Fabri, M\. Zervakis, P\. Xanthopoulos, V\. Sakkalis, and B\. Vanrumste\(2008\)Review on solving the inverse problem in eeg source analysis\.Journal of neuroengineering and rehabilitation5\(1\),pp\. 25\.Cited by:[§C\.4](https://arxiv.org/html/2605.13988#A3.SS4.p1.6),[§1](https://arxiv.org/html/2605.13988#S1.p2.1)\.
- \[24\]P\. Guan, N\. Iqbal, M\. A\. Davenport, and M\. Masood\(2024\)Solving inverse problems with model mismatch using untrained neural networks within model\-based architectures\.arXiv preprint arXiv:2403\.04847\.Cited by:[§1](https://arxiv.org/html/2605.13988#S1.p4.1)\.
- \[25\]H\. Gupta, K\. H\. Jin, H\. Q\. Nguyen, M\. T\. McCann, and M\. Unser\(2018\)CNN\-based projected gradient descent for consistent ct image reconstruction\.IEEE transactions on medical imaging37\(6\),pp\. 1440–1453\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[26\]A\. Habring and M\. Holler\(2024\)Neural\-network\-based regularization methods for inverse problems in imaging\.GAMM\-Mitteilungen47\(4\),pp\. e202470004\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[27\]K\. Hammernik, T\. Klatzer, E\. Kobler, M\. P\. Recht, D\. K\. Sodickson, T\. Pock, and F\. Knoll\(2018\)Learning a variational network for reconstruction of accelerated mri data\.Magnetic resonance in medicine79\(6\),pp\. 3055–3071\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[28\]R\. Heckel and P\. Hand\(2018\)Deep decoder: concise image representations from untrained non\-convolutional networks\.arXiv preprint arXiv:1810\.03982\.Cited by:[§E\.2](https://arxiv.org/html/2605.13988#A5.SS2.p5.6),[§1](https://arxiv.org/html/2605.13988#S1.p3.1),[§2](https://arxiv.org/html/2605.13988#S2.p4.1),[§5\.1](https://arxiv.org/html/2605.13988#S5.SS1.p2.2)\.
- \[29\]R\. Heckel and M\. Soltanolkotabi\(2020\)Compressive sensing with un\-trained neural networks: gradient descent finds a smooth approximation\.InInternational conference on machine learning,pp\. 4149–4158\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[30\]P\. Henderson, R\. Islam, P\. Bachman, J\. Pineau, D\. Precup, and D\. Meger\(2018\)Deep reinforcement learning that matters\.InProceedings of the AAAI conference on artificial intelligence,Vol\.32\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[31\]A\. E\. Hoerl and R\. W\. Kennard\(1970\)Ridge regression: applications to nonorthogonal problems\.Technometrics12\(1\),pp\. 69–82\.Cited by:[§1](https://arxiv.org/html/2605.13988#S1.p3.1)\.
- \[32\]A\. E\. Hoerl and R\. W\. Kennard\(1970\)Ridge regression: biased estimation for nonorthogonal problems\.Technometrics12\(1\),pp\. 55–67\.Cited by:[§1](https://arxiv.org/html/2605.13988#S1.p3.1),[§5\.1](https://arxiv.org/html/2605.13988#S5.SS1.p2.2)\.
- \[33\]J\. Homrighausen, L\. Horsthemke, J\. Pogorzelski, S\. Trinschek, P\. Glösekötter, and M\. Gregor\(2023\)Edge\-machine\-learning\-assisted robust magnetometer based on randomly oriented nv\-ensembles in diamond\.Sensors23\(3\),pp\. 1119\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p1.2)\.
- \[34\]A\. J\. G\. Inda, S\. Y\. Huang, N\. İmamoğlu, R\. Qin, T\. Yang, T\. Chen, Z\. Yuan, and W\. Yu\(2022\)Physics informed neural networks \(pinn\) for low snr magnetic resonance electrical properties tomography \(mrept\)\.Diagnostics12\(11\),pp\. 2627\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[35\]G\. Jagatap and C\. Hegde\(2019\)Algorithmic guarantees for inverse imaging with untrained network priors\.Advances in neural information processing systems32\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p2.1)\.
- \[36\]J\. Kaipio and E\. Somersalo\(2007\)Statistical inverse problems: discretization, model reduction and inverse crimes\.Journal of computational and applied mathematics198\(2\),pp\. 493–504\.Cited by:[§A\.4](https://arxiv.org/html/2605.13988#A1.SS4.p2.10),[§E\.1](https://arxiv.org/html/2605.13988#A5.SS1.p2.2),[§1](https://arxiv.org/html/2605.13988#S1.p1.1),[§2](https://arxiv.org/html/2605.13988#S2.p2.1),[§3\.1](https://arxiv.org/html/2605.13988#S3.SS1.p2.20),[§5\.1](https://arxiv.org/html/2605.13988#S5.SS1.p1.12),[§5\.2](https://arxiv.org/html/2605.13988#S5.SS2.p5.1)\.
- \[37\]G\. E\. Karniadakis, I\. G\. Kevrekidis, L\. Lu, P\. Perdikaris, S\. Wang, and L\. Yang\(2021\)Physics\-informed machine learning\.Nature Reviews Physics3\(6\),pp\. 422–440\.Cited by:[§1](https://arxiv.org/html/2605.13988#S1.p1.1),[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[38\]B\. Kerbl, G\. Kopanas, T\. Leimkühler, G\. Drettakis,et al\.\(2023\)3d gaussian splatting for real\-time radiance field rendering\.\.ACM Trans\. Graph\.42\(4\),pp\. 139–1\.Cited by:[§E\.2](https://arxiv.org/html/2605.13988#A5.SS2.p4.8),[§2](https://arxiv.org/html/2605.13988#S2.p4.1),[§5\.1](https://arxiv.org/html/2605.13988#S5.SS1.p2.2)\.
- \[39\]D\. P\. Kingma and J\. Ba\(2014\)Adam: a method for stochastic optimization\.arXiv preprint arXiv:1412\.6980\.Cited by:[§E\.2](https://arxiv.org/html/2605.13988#A5.SS2.p2.8)\.
- \[40\]H\. W\. Kuhn\(1955\)The hungarian method for the assignment problem\.Naval research logistics quarterly2\(1\-2\),pp\. 83–97\.Cited by:[§E\.3](https://arxiv.org/html/2605.13988#A5.SS3.p2.9),[§5\.1](https://arxiv.org/html/2605.13988#S5.SS1.p3.1)\.
- \[41\]J\. Kumar, D\. Yudilevich, A\. Smooha, I\. Zohar, A\. K\. Pariari, R\. Stöhr, A\. Denisenko, M\. Hücker, and A\. Finkler\(2024\)Room temperature relaxometry of single nitrogen vacancy centers in proximity toα\\alpha\-rucl3 nanoflakes\.Nano Letters24\(16\),pp\. 4793–4800\.Cited by:[§A\.2](https://arxiv.org/html/2605.13988#A1.SS2.p1.17),[§A\.6](https://arxiv.org/html/2605.13988#A1.SS6.p1.4),[§B\.1](https://arxiv.org/html/2605.13988#A2.SS1.p1.6),[§E\.1](https://arxiv.org/html/2605.13988#A5.SS1.p3.10),[§E\.8](https://arxiv.org/html/2605.13988#A5.SS8.p3.9),[§3\.2](https://arxiv.org/html/2605.13988#S3.SS2.p1.6),[§5\.1](https://arxiv.org/html/2605.13988#S5.SS1.p1.12),[§5\.1](https://arxiv.org/html/2605.13988#S5.SS1.p3.1),[§5\.5](https://arxiv.org/html/2605.13988#S5.SS5.p1.4),[§6](https://arxiv.org/html/2605.13988#S6.p1.6)\.
- \[42\]K\. Lee, Y\. Li, M\. Junge, and Y\. Bresler\(2016\)Blind recovery of sparse signals from subsampled convolution\.IEEE Transactions on Information Theory63\(2\),pp\. 802–821\.Cited by:[§B\.2](https://arxiv.org/html/2605.13988#A2.SS2.p1.8),[§2](https://arxiv.org/html/2605.13988#S2.p2.1)\.
- \[43\]T\. Liao, R\. Taori, I\. D\. Raji, and L\. Schmidt\(2021\)Are we learning yet? a meta review of evaluation failures across machine learning\.InThirty\-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track \(Round 2\),Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[44\]S\. Ling and T\. Strohmer\(2015\)Self\-calibration and biconvex compressive sensing\.Inverse Problems31\(11\),pp\. 115002\.Cited by:[§B\.2](https://arxiv.org/html/2605.13988#A2.SS2.p1.8),[§2](https://arxiv.org/html/2605.13988#S2.p2.1),[§3\.2](https://arxiv.org/html/2605.13988#S3.SS2.p2.6)\.
- \[45\]D\. C\. Liu and J\. Nocedal\(1989\)On the limited memory bfgs method for large scale optimization\.Mathematical programming45\(1\),pp\. 503–528\.Cited by:[§E\.2](https://arxiv.org/html/2605.13988#A5.SS2.p9.3),[§5\.1](https://arxiv.org/html/2605.13988#S5.SS1.p2.2)\.
- \[46\]K\. Liu, J\. Tian, B\. Duan, H\. Zhang, K\. Li, G\. Zhang, F\. Jelezko, R\. S\. Said, J\. Cai, and L\. Xiao\(2026\)High\-resolution wide\-field magnetic imaging with sparse sampling using nitrogen\-vacancy centers\.arXiv preprint arXiv:2602\.00679\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p1.2)\.
- \[47\]H\. Mamin, M\. Kim, M\. Sherwood, C\. T\. Rettner, K\. Ohno, D\. Awschalom, and D\. Rugar\(2013\)Nanoscale nuclear magnetic resonance with a nitrogen\-vacancy spin sensor\.Science339\(6119\),pp\. 557–560\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p1.2)\.
- \[48\]J\. N\. Martel, D\. B\. Lindell, C\. Z\. Lin, E\. R\. Chan, M\. Monteiro, and G\. Wetzstein\(2021\)Acorn: adaptive coordinate networks for neural scene representation\.arXiv preprint arXiv:2105\.02788\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[49\]G\. Mataev, P\. Milanfar, and M\. Elad\(2019\)DeepRED: deep image prior powered by red\.InProceedings of the IEEE/CVF international conference on computer vision workshops,pp\. 0–0\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[50\]M\. T\. McCann, K\. H\. Jin, and M\. Unser\(2017\)Convolutional neural networks for inverse problems in imaging: a review\.IEEE Signal Processing Magazine34\(6\),pp\. 85–95\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[51\]S\. Midha, M\. Parashar, A\. Bathla, D\. A\. Broadway, J\. Tetienne, and K\. Saha\(2024\)Optimized current\-density reconstruction from wide\-field quantum diamond magnetic field maps\.Physical Review Applied22\(1\),pp\. 014015\.Cited by:[§A\.6](https://arxiv.org/html/2605.13988#A1.SS6.p1.4),[§B\.1](https://arxiv.org/html/2605.13988#A2.SS1.p1.6),[§1](https://arxiv.org/html/2605.13988#S1.p2.1),[§1](https://arxiv.org/html/2605.13988#S1.p4.1),[§2](https://arxiv.org/html/2605.13988#S2.p1.2),[§3\.2](https://arxiv.org/html/2605.13988#S3.SS2.p1.6),[§5\.1](https://arxiv.org/html/2605.13988#S5.SS1.p3.1)\.
- \[52\]B\. Mildenhall, P\. P\. Srinivasan, M\. Tancik, J\. T\. Barron, R\. Ramamoorthi, and R\. Ng\(2021\)Nerf: representing scenes as neural radiance fields for view synthesis\.Communications of the ACM65\(1\),pp\. 99–106\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[53\]A\. Molaei, A\. Aminimehr, A\. Tavakoli, A\. Kazerouni, B\. Azad, R\. Azad, and D\. Merhof\(2023\)Implicit neural representation in medical imaging: a comparative survey\.InProceedings of the IEEE/CVF International Conference on Computer Vision,pp\. 2381–2391\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[54\]V\. Monga, Y\. Li, and Y\. C\. Eldar\(2021\)Algorithm unrolling: interpretable, efficient deep learning for signal and image processing\.IEEE Signal Processing Magazine38\(2\),pp\. 18–44\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[55\]A\. Mzyk, A\. Sigaeva, and R\. Schirhagl\(2022\)Relaxometry with nitrogen vacancy \(nv\) centers in diamond\.Accounts of chemical research55\(24\),pp\. 3572–3580\.Cited by:[§A\.6](https://arxiv.org/html/2605.13988#A1.SS6.p1.4),[§1](https://arxiv.org/html/2605.13988#S1.p2.1),[§2](https://arxiv.org/html/2605.13988#S2.p1.2),[§2](https://arxiv.org/html/2605.13988#S2.p3.1),[§3\.1](https://arxiv.org/html/2605.13988#S3.SS1.p1.17)\.
- \[56\]P\. Neal, C\. Eric, P\. Borja, and E\. Jonathan\(2011\)Distributed optimization and statistical learning via the alternating direction method of multipliers\.Foundations and Trends® in Machine learning3\(1\),pp\. 1–122\.Cited by:[§1](https://arxiv.org/html/2605.13988#S1.p3.1),[§2](https://arxiv.org/html/2605.13988#S2.p2.1),[§5\.1](https://arxiv.org/html/2605.13988#S5.SS1.p2.2)\.
- \[57\]G\. Ongie, A\. Jalal, C\. A\. Metzler, R\. G\. Baraniuk, A\. G\. Dimakis, and R\. Willett\(2020\)Deep learning techniques for inverse problems in imaging\.IEEE Journal on Selected Areas in Information Theory1\(1\),pp\. 39–56\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[58\]K\. Park, U\. Sinha, J\. T\. Barron, S\. Bouaziz, D\. B\. Goldman, S\. M\. Seitz, and R\. Martin\-Brualla\(2021\)Nerfies: deformable neural radiance fields\.InProceedings of the IEEE/CVF international conference on computer vision,pp\. 5865–5874\.Cited by:[§D\.2](https://arxiv.org/html/2605.13988#A4.SS2.p1.11),[§2](https://arxiv.org/html/2605.13988#S2.p4.1),[§4\.1](https://arxiv.org/html/2605.13988#S4.SS1.p1.5)\.
- \[59\]A\. Radford, L\. Metz, and S\. Chintala\(2015\)Unsupervised representation learning with deep convolutional generative adversarial networks\.arXiv preprint arXiv:1511\.06434\.Cited by:[§1](https://arxiv.org/html/2605.13988#S1.p3.1)\.
- \[60\]M\. Raissi, P\. Perdikaris, and G\. E\. Karniadakis\(2019\)Physics\-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations\.Journal of Computational physics378,pp\. 686–707\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[61\]I\. D\. Raji, E\. M\. Bender, A\. Paullada, E\. Denton, and A\. Hanna\(2021\)AI and the everything in the whole wide world benchmark\.arXiv preprint arXiv:2111\.15366\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[62\]B\. Recht, R\. Roelofs, L\. Schmidt, and V\. Shankar\(2019\)Do imagenet classifiers generalize to imagenet?\.InInternational conference on machine learning,pp\. 5389–5400\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[63\]A\. W\. Reed, H\. Kim, R\. Anirudh, K\. A\. Mohan, K\. Champley, J\. Kang, and S\. Jayasuriya\(2021\)Dynamic ct reconstruction from limited views with implicit neural representations and parametric motion fields\.InProceedings of the IEEE/CVF International Conference on Computer Vision,pp\. 2258–2268\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[64\]Y\. Romano, M\. Elad, and P\. Milanfar\(2017\)The little engine that could: regularization by denoising \(red\)\.SIAM journal on imaging sciences10\(4\),pp\. 1804–1844\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[65\]O\. Ronneberger, P\. Fischer, and T\. Brox\(2015\)U\-net: convolutional networks for biomedical image segmentation\.InInternational Conference on Medical image computing and computer\-assisted intervention,pp\. 234–241\.Cited by:[§1](https://arxiv.org/html/2605.13988#S1.p3.1),[§2](https://arxiv.org/html/2605.13988#S2.p3.1),[§5\.1](https://arxiv.org/html/2605.13988#S5.SS1.p2.2)\.
- \[66\]J\. Rovny, S\. Gopalakrishnan, A\. C\. B\. Jayich, P\. Maletinsky, E\. Demler, and N\. P\. de Leon\(2024\)Nanoscale diamond quantum sensors for many\-body physics\.Nature Reviews Physics6\(12\),pp\. 753–768\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p1.2)\.
- \[67\]L\. I\. Rudin, S\. Osher, and E\. Fatemi\(1992\)Nonlinear total variation based noise removal algorithms\.Physica D: nonlinear phenomena60\(1\-4\),pp\. 259–268\.Cited by:[§1](https://arxiv.org/html/2605.13988#S1.p4.1),[§2](https://arxiv.org/html/2605.13988#S2.p2.1),[§5\.1](https://arxiv.org/html/2605.13988#S5.SS1.p2.2)\.
- \[68\]L\. Shen, J\. Pauly, and L\. Xing\(2022\)NeRP: implicit neural representation learning with prior embedding for sparsely sampled image reconstruction\.IEEE transactions on neural networks and learning systems35\(1\),pp\. 770–782\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[69\]K\. Shi, X\. Zhou, and S\. Gu\(2024\)Improved implicit neural representation with fourier reparameterized training\.InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,pp\. 25985–25994\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[70\]V\. Sitzmann, J\. Martel, A\. Bergman, D\. Lindell, and G\. Wetzstein\(2020\)Implicit neural representations with periodic activation functions\.Advances in neural information processing systems33,pp\. 7462–7473\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1),[§4\.1](https://arxiv.org/html/2605.13988#S4.SS1.p1.5)\.
- \[71\]A\. Smooha, J\. Kumar, D\. Yudilevich, J\. W\. Rosenberg, V\. Bayer, R\. Stöhr, A\. Denisenko, T\. Bendikov, A\. Kossoy, I\. Pinkas,et al\.\(2026\)Sensing single\-molecule magnets with nitrogen\-vacancy centers\.Nano Letters\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p1.2)\.
- \[72\]M\. Tancik, P\. Srinivasan, B\. Mildenhall, S\. Fridovich\-Keil, N\. Raghavan, U\. Singhal, R\. Ramamoorthi, J\. Barron, and R\. Ng\(2020\)Fourier features let networks learn high frequency functions in low dimensional domains\.Advances in neural information processing systems33,pp\. 7537–7547\.Cited by:[§D\.2](https://arxiv.org/html/2605.13988#A4.SS2.p1.10),[§D\.2](https://arxiv.org/html/2605.13988#A4.SS2.p1.11),[§1](https://arxiv.org/html/2605.13988#S1.p4.1),[§2](https://arxiv.org/html/2605.13988#S2.p4.1),[§4\.1](https://arxiv.org/html/2605.13988#S4.SS1.p1.5)\.
- \[73\]J\. Tetienne, T\. Hingant, L\. Rondin, A\. Cavaillès, L\. Mayer, G\. Dantelle, T\. Gacoin, J\. Wrachtrup, J\. Roch, and V\. Jacques\(2013\)Spin relaxometry of single nitrogen\-vacancy defects in diamond nanocrystals for magnetic noise sensing\.arXiv preprint arXiv:1304\.1197\.Cited by:[7\(b\)](https://arxiv.org/html/2605.13988#A1.F7.sf2),[7\(b\)](https://arxiv.org/html/2605.13988#A1.F7.sf2.16.8),[§A\.1](https://arxiv.org/html/2605.13988#A1.SS1.p1.19),[§A\.2](https://arxiv.org/html/2605.13988#A1.SS2.p1.5),[§A\.6](https://arxiv.org/html/2605.13988#A1.SS6.p1.4),[§A\.9](https://arxiv.org/html/2605.13988#A1.SS9.p2.2),[§1](https://arxiv.org/html/2605.13988#S1.p2.1),[§2](https://arxiv.org/html/2605.13988#S2.p1.2),[§3\.1](https://arxiv.org/html/2605.13988#S3.SS1.p1.17)\.
- \[74\]M\. Tsukamoto, S\. Ito, K\. Ogawa, Y\. Ashida, K\. Sasaki, and K\. Kobayashi\(2022\)Accurate magnetic field imaging using nanodiamond quantum sensors enhanced by machine learning\.Scientific reports12\(1\),pp\. 13942\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p1.2)\.
- \[75\]D\. Ulyanov, A\. Vedaldi, and V\. Lempitsky\(2018\)Deep image prior\.InProceedings of the IEEE conference on computer vision and pattern recognition,pp\. 9446–9454\.Cited by:[§1](https://arxiv.org/html/2605.13988#S1.p3.1),[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[76\]T\. Van der Sar, F\. Casola, R\. Walsworth, and A\. Yacoby\(2015\)Nanometre\-scale probing of spin waves using single electron spins\.Nature communications6\(1\),pp\. 7886\.Cited by:[§A\.2](https://arxiv.org/html/2605.13988#A1.SS2.p1.5),[§A\.6](https://arxiv.org/html/2605.13988#A1.SS6.p1.4),[§2](https://arxiv.org/html/2605.13988#S2.p1.2),[§3\.3](https://arxiv.org/html/2605.13988#S3.SS3.p2.6)\.
- \[77\]S\. V\. Venkatakrishnan, C\. A\. Bouman, and B\. Wohlberg\(2013\)Plug\-and\-play priors for model based reconstruction\.In2013 IEEE global conference on signal and information processing,pp\. 945–948\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[78\]J\. Virieux and S\. Operto\(2010\)An overview of full\-waveform inversion in exploration geophysics\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p2.1)\.
- \[79\]Z\. Wang, A\. C\. Bovik, H\. R\. Sheikh, and E\. P\. Simoncelli\(2004\)Image quality assessment: from error visibility to structural similarity\.IEEE transactions on image processing13\(4\),pp\. 600–612\.Cited by:[§E\.3](https://arxiv.org/html/2605.13988#A5.SS3.p5.2)\.
- \[80\]W\. Xue, L\. Zhang, X\. Mou, and A\. C\. Bovik\(2013\)Gradient magnitude similarity deviation: a highly efficient perceptual image quality index\.IEEE transactions on image processing23\(2\),pp\. 684–695\.Cited by:[§E\.3](https://arxiv.org/html/2605.13988#A5.SS3.p1.8),[§5\.1](https://arxiv.org/html/2605.13988#S5.SS1.p3.1)\.
- \[81\]Y\. Yang, J\. Sun, H\. Li, and Z\. Xu\(2018\)ADMM\-csnet: a deep learning approach for image compressive sensing\.IEEE transactions on pattern analysis and machine intelligence42\(3\),pp\. 521–538\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[82\]K\. Zhang, W\. Zuo, S\. Gu, and L\. Zhang\(2017\)Learning deep cnn denoiser prior for image restoration\.InProceedings of the IEEE conference on computer vision and pattern recognition,pp\. 3929–3938\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p3.1)\.
- \[83\]Y\. Zhang, H\. Kuo, and J\. Wright\(2018\)Structured local minima in sparse blind deconvolution\.Advances in neural information processing systems31\.Cited by:[§C\.4](https://arxiv.org/html/2605.13988#A3.SS4.p1.7),[§1](https://arxiv.org/html/2605.13988#S1.p2.1)\.
- \[84\]B\. Zhao, A\. Levis, L\. Connor, P\. P\. Srinivasan, and K\. L\. Bouman\(2024\)Single view refractive index tomography with neural fields\.InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,pp\. 25358–25367\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[85\]B\. Zhao, A\. Levis, L\. Connor, P\. P\. Srinivasan, and K\. L\. Bouman\(2025\)Revealing the 3d cosmic web through gravitationally constrained neural fields\.arXiv preprint arXiv:2504\.15262\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.
- \[86\]T\. Zhong, Y\. Hu, D\. Zheng, A\. Sood, and C\. Allen\-Blanchette\(2026\)Neural field thermal tomography: a differentiable physics framework for non\-destructive evaluation\.arXiv preprint arXiv:2603\.11045\.Cited by:[§2](https://arxiv.org/html/2605.13988#S2.p4.1)\.

## Appendix AForward Operator Derivation

This appendix expands the measurement model of Section[3\.1](https://arxiv.org/html/2605.13988#S3.SS1)from first principles, makes the difference between the scalarℱ1\\mathcal\{F\}\_\{1\}and tensorℱ2\\mathcal\{F\}\_\{2\}operators explicit at the dipole level, and identifies the regimes in which each form is physically appropriate\.

### A\.1Sample model and dipolar Green tensor

We model the sample as a \(possibly continuous\) distribution of independent magnetic fluctuators with nonnegative densityρ:Ω→ℝ≥0\\rho:\\Omega\\to\\mathbb\{R\}\_\{\\geq 0\}over a 2D planeΩ⊂ℝ2\\Omega\\subset\\mathbb\{R\}^\{2\}\. A widefield NV array sits at standoff heightz0z\_\{0\}, so the relative position from a sample point𝐫src∈Ω×\{0\}\\mathbf\{r\}\_\{\\mathrm\{src\}\}\\in\\Omega\\times\\\{0\\\}to an NV pixel𝐫∈Ω×\{z0\}\\mathbf\{r\}\\in\\Omega\\times\\\{z\_\{0\}\\\}is𝐑​\(𝐫,𝐫src\)=\(𝐫−𝐫src,z0\)\\mathbf\{R\}\(\\mathbf\{r\},\\mathbf\{r\}\_\{\\mathrm\{src\}\}\)=\(\\mathbf\{r\}\-\\mathbf\{r\}\_\{\\mathrm\{src\}\},z\_\{0\}\)\. The dipolar Green tensor

Gi​a​\(𝐑\)=μ04​π​3​Ri​Ra−‖𝐑‖2​δi​a‖𝐑‖5,i,a∈\{x,y,z\}G\_\{ia\}\(\\mathbf\{R\}\)\\;=\\;\\frac\{\\mu\_\{0\}\}\{4\\pi\}\\,\\frac\{3R\_\{i\}R\_\{a\}\-\\left\\lVert\\mathbf\{R\}\\right\\rVert^\{2\}\\delta\_\{ia\}\}\{\\left\\lVert\\mathbf\{R\}\\right\\rVert^\{5\}\},\\qquad i,a\\in\\\{x,y,z\\\}\(8\)maps a unit dipole moment along axisaaat𝐫src\\mathbf\{r\}\_\{\\mathrm\{src\}\}to a magnetic\-field component along axisiiat𝐫\\mathbf\{r\}\. For an NV with quantization axis𝐧\\mathbf\{n\}, the projected response isGnv,a=∑ini​Gi​aG\_\{\\mathrm\{nv\},a\}=\\sum\_\{i\}n\_\{i\}G\_\{ia\}; for azz\-aligned NV \(the convention used throughout this paper\) we haveGnv,a=Gz​a=Ga​zG\_\{\\mathrm\{nv\},a\}=G\_\{za\}=G\_\{az\}by symmetry ofGGin its two indices\. Thermal magnetic fluctuations of a dipole at𝐫src\\mathbf\{r\}\_\{\\mathrm\{src\}\}along axisaaproduce a spectral densityρ​\(𝐫src\)​L​\(ω;ωL​\(𝐫src\)\)\\rho\(\\mathbf\{r\}\_\{\\mathrm\{src\}\}\)\\,L\(\\omega;\\omega\_\{L\}\(\\mathbf\{r\}\_\{\\mathrm\{src\}\}\)\)at the source, withLLthe Lorentzian of Section[3\.1](https://arxiv.org/html/2605.13988#S3.SS1)\[[73](https://arxiv.org/html/2605.13988#bib.bib8),[18](https://arxiv.org/html/2605.13988#bib.bib4),[11](https://arxiv.org/html/2605.13988#bib.bib6)\]\.

We retain three structural assumptions that are standard in the NV\-relaxometry literature: \(i\) sample dipoles are independent thermal fluctuators \(no inter\-source coherence\), \(ii\) the NV array reads out the noise spectrum at the readout pixel𝐫\\mathbf\{r\}rather than at the source pixel𝐫src\\mathbf\{r\}\_\{\\mathrm\{src\}\}\(a controlled approximation whenωL\\omega\_\{L\}varies slowly over the kernel support\), and \(iii\) the source\-spin fluctuation componentsa∈\{x,y,z\}a\\in\\\{x,y,z\\\}are independent thermal channels with equal variance, i\.e\., isotropic in the lab frame, so each channel contributes\|Ga​z\|2\\left\|G\_\{az\}\\right\|^\{2\}to the NV\-axis noise field at pixel𝐫\\mathbf\{r\}\. The physical regime in which these assumptions hold is discussed in Appendix[A\.6](https://arxiv.org/html/2605.13988#A1.SS6)\.

### A\.2Pointwise derivation ofℱ2\\mathcal\{F\}\_\{2\}\(incoherent thermal sum\)

For independent thermal fluctuators, discretize the source plane into cells of areaΔ\\Deltacentered at\{𝐫i\}\\\{\\mathbf\{r\}\_\{i\}\\\}and writeρi:=ρ​\(𝐫i\)​Δ\\rho\_\{i\}:=\\rho\(\\mathbf\{r\}\_\{i\}\)\\,\\Deltafor the dimensionless source weight in cellii\. The noise spectrum measured at NV position𝐫\\mathbf\{r\}is the sum of squared field amplitudes from all sources, summed over independent fluctuation channels\[[73](https://arxiv.org/html/2605.13988#bib.bib8),[76](https://arxiv.org/html/2605.13988#bib.bib24)\]:

ℱ2​\(ρ,ωL\)​\(ω,𝐫\)\\displaystyle\\mathcal\{F\}\_\{2\}\(\\rho,\\omega\_\{L\}\)\(\\omega,\\mathbf\{r\}\)=∑a∈\{x,y,z\}∑i\|Ga​z​\(𝐑i\)\|2​ρi​L​\(ω;ωL​\(𝐫i\)\)\\displaystyle\\;=\\;\\sum\_\{a\\in\\\{x,y,z\\\}\}\\sum\_\{i\}\\,\\left\|G\_\{az\}\(\\mathbf\{R\}\_\{i\}\)\\right\|^\{2\}\\,\\rho\_\{i\}\\,L\\\!\\bigl\(\\omega;\\omega\_\{L\}\(\\mathbf\{r\}\_\{i\}\)\\bigr\)≈\[∑a∈\{x,y,z\}∑i\|Ga​z​\(𝐑i\)\|2​ρi\]​L​\(ω;ωL​\(𝐫\)\),\\displaystyle\\;\\approx\\;\\biggl\[\\sum\_\{a\\in\\\{x,y,z\\\}\}\\sum\_\{i\}\\left\|G\_\{az\}\(\\mathbf\{R\}\_\{i\}\)\\right\|^\{2\}\\,\\rho\_\{i\}\\biggr\]\\,L\\\!\\bigl\(\\omega;\\omega\_\{L\}\(\\mathbf\{r\}\)\\bigr\),\(9\)where𝐑i=𝐑​\(𝐫,𝐫i\)\\mathbf\{R\}\_\{i\}=\\mathbf\{R\}\(\\mathbf\{r\},\\mathbf\{r\}\_\{i\}\)\. The first line is the source\-side exact form \(ℱ3\\mathcal\{F\}\_\{3\}in our internal notation, used as the simulatorℱ3\\mathcal\{F\}\_\{3\}for ground\-truth data generation in Section[5](https://arxiv.org/html/2605.13988#S5)\); the second line factors the Lorentzian out of the spatial sum, which is exact whenωL\\omega\_\{L\}is locally constant over the kernel support and equivalent \(up to grid\-area factors\) to\[∑a\(\|Ga​z\|2∗ρ\)​\(𝐫\)\]​L​\(ω;ωL​\(𝐫\)\)\[\\sum\_\{a\}\(\\left\|G\_\{az\}\\right\|^\{2\}\\\!\\ast\\rho\)\(\\mathbf\{r\}\)\]\\,L\(\\omega;\\omega\_\{L\}\(\\mathbf\{r\}\)\)used by the FFT\-based reconstruction stack\. The contraction overaais incoherent: distinct channels and distinct sources contribute independently, with no kernel\-product cross\-terms\. Density enters Eq\. \([9](https://arxiv.org/html/2605.13988#A1.E9)\)*linearly*inρi\\rho\_\{i\}, which is the property exploited by the scale\-correction proof of Appendix[B](https://arxiv.org/html/2605.13988#A2)\. However, during the variation of the operator in the real\-world dataset, the operator\-implied internal\-rate slopes arebpredℱ2=3\.77b\_\{\\mathrm\{pred\}\}^\{\\mathcal\{F\}\_\{2\}\}=3\.77andbpredℱ1=1\.89b\_\{\\mathrm\{pred\}\}^\{\\mathcal\{F\}\_\{1\}\}=1\.89via Kumar et al\.\[[41](https://arxiv.org/html/2605.13988#bib.bib27)\]α=1\.06±0\.22\\alpha=1\.06\\pm 0\.22; the measured slopebexp=1\.43±0\.21b\_\{\\mathrm\{exp\}\}=1\.43\\pm 0\.21is closer toℱ1\\mathcal\{F\}\_\{1\}\. But slope alone does not penalize the amplitude error and is therefore ambiguous in isolation\.

### A\.3Pointwise derivation ofℱ1\\mathcal\{F\}\_\{1\}\(scalar coherent square\)

The simplified operatorℱ1\\mathcal\{F\}\_\{1\}retains only a single NV\-axis projection and squares the convolved field as a final, post\-superposition operation\. Using the same discrete cell weightsρi=ρ​\(𝐫i\)​Δ\\rho\_\{i\}=\\rho\(\\mathbf\{r\}\_\{i\}\)\\Delta, expanding the modulus square gives

ℱ1​\(ρ,ωL\)​\(ω,𝐫\)\\displaystyle\\mathcal\{F\}\_\{1\}\(\\rho,\\omega\_\{L\}\)\(\\omega,\\mathbf\{r\}\)=\|∑iGnv​\(𝐑​\(𝐫,𝐫i\)\)​ρi\|2​L​\(ω;ωL​\(𝐫\)\)\\displaystyle\\;=\\;\\biggl\|\\sum\_\{i\}\\,G\_\{\\mathrm\{nv\}\}\(\\mathbf\{R\}\(\\mathbf\{r\},\\mathbf\{r\}\_\{i\}\)\)\\,\\rho\_\{i\}\\biggr\|^\{2\}\\,L\(\\omega;\\omega\_\{L\}\(\\mathbf\{r\}\)\)=\[∑iρi2​Gnv​\(𝐑i\)2⏟diagonal\+∑i≠jρi​ρj​Gnv​\(𝐑i\)​Gnv​\(𝐑j\)⏟cross term​𝒞​\(𝐫\)\]​L​\(ω;ωL​\(𝐫\)\),\\displaystyle\\;=\\;\\biggl\[\\underbrace\{\\sum\_\{i\}\\rho\_\{i\}^\{2\}\\,G\_\{\\mathrm\{nv\}\}\(\\mathbf\{R\}\_\{i\}\)^\{2\}\}\_\{\\text\{diagonal\}\}\\;\+\\;\\underbrace\{\\sum\_\{i\\neq j\}\\rho\_\{i\}\\rho\_\{j\}\\,G\_\{\\mathrm\{nv\}\}\(\\mathbf\{R\}\_\{i\}\)\\,G\_\{\\mathrm\{nv\}\}\(\\mathbf\{R\}\_\{j\}\)\}\_\{\\text\{cross term\}\\,\\mathcal\{C\}\(\\mathbf\{r\}\)\}\\biggr\]\\,L\(\\omega;\\omega\_\{L\}\(\\mathbf\{r\}\)\),\(10\)where𝐑i=𝐑​\(𝐫,𝐫i\)\\mathbf\{R\}\_\{i\}=\\mathbf\{R\}\(\\mathbf\{r\},\\mathbf\{r\}\_\{i\}\)\. The diagonal term reintroduces the squared cell weightρi2\\rho\_\{i\}^\{2\}instead of the linear cell weightρi\\rho\_\{i\}, soℱ1​\(c​ρ,ωL\)=c2​ℱ1​\(ρ,ωL\)\\mathcal\{F\}\_\{1\}\(c\\rho,\\omega\_\{L\}\)=c^\{2\}\\,\\mathcal\{F\}\_\{1\}\(\\rho,\\omega\_\{L\}\); the cross\-term𝒞​\(𝐫\)\\mathcal\{C\}\(\\mathbf\{r\}\)couples pairs of distinct sources through the product of their NV\-axis kernels, which has no counterpart in Eq\. \([9](https://arxiv.org/html/2605.13988#A1.E9)\)\. Whenρ\\rhois sparse and well\-separated relative to the support ofGnvG\_\{\\mathrm\{nv\}\},𝒞\\mathcal\{C\}is small pointwise andℱ1\\mathcal\{F\}\_\{1\}behaves as a \(nonlinear\-in\-ρ\\rho\) close cousin ofℱ2\\mathcal\{F\}\_\{2\}; when sources are close,𝒞\\mathcal\{C\}dominates and the two operators differ in both magnitude and spatial structure\.

### A\.4Direct simulatorℱ3\\mathcal\{F\}\_\{3\}and the operator\-fidelity check

The two operatorsℱ1\\mathcal\{F\}\_\{1\}andℱ2\\mathcal\{F\}\_\{2\}are FFT\-factorized: each evaluates a 2D convolution between a per\-channel kernel and the source density, then multiplies by a Lorentzian factor at the readout pixel\. The factorization is exact forℱ2\\mathcal\{F\}\_\{2\}when the Larmor fieldωL\\omega\_\{L\}is locally constant over the kernel support \(Eq\. \([9](https://arxiv.org/html/2605.13988#A1.E9)\)\), and is only an approximation whenωL\\omega\_\{L\}varies across the support\. The direct simulatorℱ3\\mathcal\{F\}\_\{3\}removes this approximation by evaluating the source\-side superposition pixel\-by\-pixel without FFT factorization,

ℱ3​\(ρ,ωL\)​\(ω,𝐫\)\\displaystyle\\mathcal\{F\}\_\{3\}\(\\rho,\\omega\_\{L\}\)\(\\omega,\\mathbf\{r\}\)=∑𝐫src∈Ωρ​\(𝐫src\)​P​\(𝐑​\(𝐫,𝐫src\)\)​L​\(ω;ωL​\(𝐫src\)\),\\displaystyle\\;=\\;\\sum\_\{\\mathbf\{r\}\_\{\\mathrm\{src\}\}\\in\\Omega\}\\rho\(\\mathbf\{r\}\_\{\\mathrm\{src\}\}\)\\,P\\\!\\bigl\(\\mathbf\{R\}\(\\mathbf\{r\},\\mathbf\{r\}\_\{\\mathrm\{src\}\}\)\\bigr\)\\,L\\\!\\bigl\(\\omega;\\omega\_\{L\}\(\\mathbf\{r\}\_\{\\mathrm\{src\}\}\)\\bigr\),\(11\)P​\(𝐑\)\\displaystyle P\(\\mathbf\{R\}\)=∑a∈\{x,y,z\}\|Ga​z​\(𝐑\)\|2,\\displaystyle\\;=\\sum\_\{a\\in\\\{x,y,z\\\}\}\\left\|G\_\{az\}\(\\mathbf\{R\}\)\\right\|^\{2\},where the Lorentzian is evaluated at each*source*pixel𝐫src\\mathbf\{r\}\_\{\\mathrm\{src\}\}rather than factored out at the readout pixel, restoring the per\-source spectral response of the underlying physics\.ℱ3\\mathcal\{F\}\_\{3\}is the source\-side form \(the first line of Eq\. \([9](https://arxiv.org/html/2605.13988#A1.E9)\)\) without the locally\-constant\-ωL\\omega\_\{L\}factorization, and is implemented in float64 with a direct source\-loop evaluator\.

Whereℱ3\\mathcal\{F\}\_\{3\}is used\.The primary main\-text benchmark in this paper is the cross\-fidelity benchmark on a512512\-sample dataset \(Table[1](https://arxiv.org/html/2605.13988#S5.T1), Section[5\.2](https://arxiv.org/html/2605.13988#S5.SS2), Appendix[E\.1](https://arxiv.org/html/2605.13988#A5.SS1)\): spectra are produced by the source\-side direct simulatorℱ3\\mathcal\{F\}\_\{3\}and inverted byℱ1\\mathcal\{F\}\_\{1\}orℱ2\\mathcal\{F\}\_\{2\}, so the inversion operator is never reused as the data\-generation operator and the inverse\-crime risk\[[36](https://arxiv.org/html/2605.13988#bib.bib3)\]is avoided by construction\. The matched\-operator benchmarks \(ℱ1/ℱ1\\mathcal\{F\}\_\{1\}/\\mathcal\{F\}\_\{1\}on512512samples andℱ2/ℱ2\\mathcal\{F\}\_\{2\}/\\mathcal\{F\}\_\{2\}on512512samples; Table[3](https://arxiv.org/html/2605.13988#S5.T3), Appendix[E\.4](https://arxiv.org/html/2605.13988#A5.SS4)\) are reported alongside as a complementary view that includes amortized and untrained\-prior baselines for which theℱ3\\mathcal\{F\}\_\{3\}form is not implemented; they are intentionally inverse\-crime by construction and we do not use them as the central evidence for the operator\-fidelity claim\.

### A\.5Operator difference

Subtracting the discreteℱ2\\mathcal\{F\}\_\{2\}approximation in Eq\. \([9](https://arxiv.org/html/2605.13988#A1.E9)\) from Eq\. \([10](https://arxiv.org/html/2605.13988#A1.E10)\) at fixedω\\omegaand𝐫\\mathbf\{r\},

\(ℱ1−ℱ2\)​\(ρ,ωL\)​\(ω,𝐫\)=\[𝒞​\(𝐫\)\+∑i\(ρi2−ρi\)​Gnv​\(𝐑i\)2−∑a≠z∑iρi​\|Ga​z​\(𝐑i\)\|2\]​L​\(ω;ωL​\(𝐫\)\),\\bigl\(\\mathcal\{F\}\_\{1\}\-\\mathcal\{F\}\_\{2\}\\bigr\)\(\\rho,\\omega\_\{L\}\)\(\\omega,\\mathbf\{r\}\)\\;=\\;\\Bigl\[\\mathcal\{C\}\(\\mathbf\{r\}\)\\;\+\\;\\sum\_\{i\}\\bigl\(\\rho\_\{i\}^\{2\}\-\\rho\_\{i\}\\bigr\)G\_\{\\mathrm\{nv\}\}\(\\mathbf\{R\}\_\{i\}\)^\{2\}\\;\-\\;\\sum\_\{a\\neq z\}\\sum\_\{i\}\\rho\_\{i\}\\,\\left\|G\_\{az\}\(\\mathbf\{R\}\_\{i\}\)\\right\|^\{2\}\\Bigr\]\\,L\(\\omega;\\omega\_\{L\}\(\\mathbf\{r\}\)\),\(12\)i\.e\., the gap is the cross\-term𝒞\\mathcal\{C\}, plus the signed diagonal discrepancyρi2−ρi\\rho\_\{i\}^\{2\}\-\\rho\_\{i\}, minus the transverse\-channel powera∈\{x,y\}a\\in\\\{x,y\\\}that is present inℱ2\\mathcal\{F\}\_\{2\}and omitted byℱ1\\mathcal\{F\}\_\{1\}\. These contributions are spatially structured across𝐫\\mathbf\{r\}and generally nonzero in multi\-source scenes\. Eq\. \([12](https://arxiv.org/html/2605.13988#A1.E12)\) is the algebraic origin of the inverse\-landscape and ranking changes documented in Section[5](https://arxiv.org/html/2605.13988#S5)\.

### A\.6Physical regime of applicability

Both forms are legitimate approximations in different physical regimes\.ℱ1\\mathcal\{F\}\_\{1\}is appropriate for coherent\-emitter superposition \(e\.g\., a small number of phase\-locked classical dipoles driven by the same coherent source\), where the field amplitudes literally add before being squared by the detector\.ℱ2\\mathcal\{F\}\_\{2\}is appropriate for incoherent thermal ensembles, where distinct fluctuators have independent random phases and only second moments of the field add\[[73](https://arxiv.org/html/2605.13988#bib.bib8),[76](https://arxiv.org/html/2605.13988#bib.bib24),[11](https://arxiv.org/html/2605.13988#bib.bib6)\]\. NV relaxometry of nuclear, electronic, or magnonic spin baths is in the second regime\[[55](https://arxiv.org/html/2605.13988#bib.bib7),[41](https://arxiv.org/html/2605.13988#bib.bib27),[51](https://arxiv.org/html/2605.13988#bib.bib23)\], soℱ2\\mathcal\{F\}\_\{2\}is the physically appropriate choice\. We retainℱ1\\mathcal\{F\}\_\{1\}as a benchmark only because it is computationally cheaper, has historically appeared in some open\-source simulators\[[51](https://arxiv.org/html/2605.13988#bib.bib23)\], and produces a meaningfully different inverse landscape that exposes the operator\-fidelity sensitivity exploited in Section[5](https://arxiv.org/html/2605.13988#S5)\.

### A\.7Lorentzian and Larmor mapping

The spectral response in Eq\. \([9](https://arxiv.org/html/2605.13988#A1.E9)\) is the Lorentzian

L​\(ω;ωL\)=γ2\(ω−ωL\)2\+γ2,L\(\\omega;\\omega\_\{L\}\)\\;=\\;\\frac\{\\gamma^\{2\}\}\{\(\\omega\-\\omega\_\{L\}\)^\{2\}\+\\gamma^\{2\}\},\(13\)with linewidthγ\\gammafixed by the experimental setup \(we useγ=0\.5\\gamma=0\.5GHz throughout, matching the simulator\)\. The Larmor mapωL:Ω→ℝ\+\\omega\_\{L\}:\\Omega\\to\\mathbb\{R\}\_\{\+\}encodes the per\-pixel quasi\-static detuning of the local resonance and is treated as a second unknown field in the inverse problem\. In the FFT\-based factorization used by all reconstruction methods,L​\(ω;ωL​\(𝐫\)\)L\(\\omega;\\omega\_\{L\}\(\\mathbf\{r\}\)\)is evaluated at the readout pixel𝐫\\mathbf\{r\}rather than at the source pixel𝐫src\\mathbf\{r\}\_\{\\mathrm\{src\}\}, which is exact whenωL\\omega\_\{L\}is approximately constant over the kernel support and a controlled approximation otherwise\. The direct simulatorℱ3\\mathcal\{F\}\_\{3\}used to generate ground\-truth data evaluatesLLat𝐫src\\mathbf\{r\}\_\{\\mathrm\{src\}\}and is therefore closer to the underlying physics; in the main\-text cross\-fidelity benchmark we useℱ3\\mathcal\{F\}\_\{3\}for data generation and eitherℱ1\\mathcal\{F\}\_\{1\}orℱ2\\mathcal\{F\}\_\{2\}for reconstruction \(Appendix[A\.4](https://arxiv.org/html/2605.13988#A1.SS4)\)\.

### A\.8Robustness to spatially varying Larmor frequencies

We further validated the robustness of the FFT\-factorized solverF2F\_\{2\}against spatial variation of the Larmor frequency within one dipolar kernel footprint\. In the real\-data setting, the Larmor frequency is primarily set by the externally applied magnetic field and is therefore nearly uniform across the reconstruction field of view, with only weak local perturbations\. The relevant control parameter is the dimensionless ratioχ∼Δ​ωL\(kernel\)/γ\\chi\\sim\\Delta\\omega\_\{L\}^\{\(\\mathrm\{kernel\}\)\}/\\gamma, whereΔ​ωL\(kernel\)\\Delta\\omega\_\{L\}^\{\(\\mathrm\{kernel\}\)\}denotes the in\-kernel variation ofωL\\omega\_\{L\}andγ\\gammais the effective linewidth\. In a controlled stress test, the relative forward error ofF2F\_\{2\}with respect to the direct source\-side operatorF3F\_\{3\}remained small whenχ≪1\\chi\\ll 1\(about2\.6×10−22\.6\\times 10^\{\-2\}atχ≈0\.3\\chi\\approx 0\.3\), but increased substantially asχ\\chiapproached unity \(about1\.6×10−11\.6\\times 10^\{\-1\}atχ≈1\.08\\chi\\approx 1\.08\)\. These results confirm thatF2F\_\{2\}is reliable in the broad\-linewidth, slowly varying regime relevant to our experiments\.

### A\.9Further physical variation of the forward operator

Figure[6](https://arxiv.org/html/2605.13988#A1.F6)shows the spectrum noise generated by different components of the green kernel\. The shape and scale of the spectrum noise stay consistent with the physical formula

![Refer to caption](https://arxiv.org/html/2605.13988v1/figures/sec5/PV-Stage4.png)Figure 6:Verification of the Green kernel in physical solverAccording to\[[73](https://arxiv.org/html/2605.13988#bib.bib8)\], the experiment detects the longitudinal relaxation timeT1T\_\{1\}of the dipole spin bath \(with bath surface densityσb\\sigma\_\{b\}\) on the nanodiamond with the NV center\.

The cited work gives the formula

1T1=1T1bulk\+\(48​μ02​γe4​ℏ2​CSπ​D4\)​\(σb​R​\(σb\)ω02\+R​\(σb\)2\)\\frac\{1\}\{T\_\{1\}\}=\\frac\{1\}\{T\_\{1\}^\{\\text\{bulk\}\}\}\+\\left\(\\frac\{48\\mu\_\{0\}^\{2\}\\gamma\_\{e\}^\{4\}\\hbar^\{2\}C\_\{S\}\}\{\\pi D^\{4\}\}\\right\)\\left\(\\frac\{\\sigma\_\{b\}R\(\\sigma\_\{b\}\)\}\{\\omega\_\{0\}^\{2\}\+R\(\\sigma\_\{b\}\)^\{2\}\}\\right\)\(14\)and

B⟂2=\(4​μ0​γe​ℏπ\)2​π​CS​σbD4B\_\{\\perp\}^\{2\}=\\left\(\\frac\{4\\mu\_\{0\}\\gamma\_\{e\}\\hbar\}\{\\pi\}\\right\)^\{2\}\\pi C\_\{S\}\\frac\{\\sigma\_\{b\}\}\{D^\{4\}\}\(15\)whereCS=12​S\+1​∑m=−SSm2=S​\(S\+1\)3C\_\{S\}=\\frac\{1\}\{2S\+1\}\\sum\_\{m=\-S\}^\{S\}m^\{2\}=\\frac\{S\(S\+1\)\}\{3\}andR​\(σb\)=1τcR\(\\sigma\_\{b\}\)=\\frac\{1\}\{\\tau\_\{c\}\}\. The following relationship has been clearly stated in the research paper

B⟂2∝σbD4B\_\{\\perp\}^\{2\}\\propto\\frac\{\\sigma\_\{b\}\}\{D^\{4\}\}\(16\)1T1∝σbD4\\frac\{1\}\{T\_\{1\}\}\\propto\\frac\{\\sigma\_\{b\}\}\{D^\{4\}\}\(17\)To verify the physical correctness of the forward operator in the research, the experiment is reproduced\. As shown in Figure[7](https://arxiv.org/html/2605.13988#A1.F7), the noise power is inversely proportional to the fourth power of the diameter of the diamondDD\.

![Refer to caption](https://arxiv.org/html/2605.13988v1/figures/sec5/PV-Stage3-1.png)\(a\)Normalized line cut at row 32 comparingρ\\rhoand1/T11/T\_\{1\}spatial profiles, demonstrating the qualitative agreement between the two distributions\.
![Refer to caption](https://arxiv.org/html/2605.13988v1/figures/sec5/PV-Stage2.png)\(b\)Forward operator reproduction of the Tetienneet al\.dipole bath model\[[73](https://arxiv.org/html/2605.13988#bib.bib8)\]\.Left:Simulated noise power \(dots\) follows the theoreticalD−4D^\{\-4\}scaling, and the Lorentzian spectrumS​\(ω\)S\(\\omega\)shows stronger noise for smallerDDnearωNV=2\.87​GHz\\omega\_\{\\text\{NV\}\}=2\.87\\,\\text\{GHz\}\.Right:The reconstructed1/T1​\(x,y\)1/T\_\{1\}\(x,y\)map matches the inputρ​\(x,y\)\\rho\(x,y\)distribution \(Pearsonr=99\.55%r=99\.55\\%\), confirming1/T1​\(x\)≈\(\|G\|2∗ρ\)​\(x\)1/T\_\{1\}\(x\)\\approx\(\|G\|^\{2\}\*\\rho\)\(x\)

Figure 7:The reproduction of the relationship between magnetic noise and1T1\\frac\{1\}\{T\_\{1\}\}using the forward operator and the relationship between magnetic noise and diameter using the forward operatorFrom Eq\. \([16](https://arxiv.org/html/2605.13988#A1.E16)\) and Eq\. \([17](https://arxiv.org/html/2605.13988#A1.E17)\), we can find out that the magnetic noise has the same order of magnitude, which means

1T1​\(x\)≈S=\(\|G\|2∗ρ\)​\(x\)\\frac\{1\}\{T\_\{1\}\(x\)\}\\approx S=\\left\(\|G\|^\{2\}\*\\rho\\right\)\(x\)\(18\)Thus for a stated dipole bath, the distribution ofρ\\rhoand1T1​\(x\)\\frac\{1\}\{T\_\{1\}\(x\)\}should be similar, which is also followed by the forward operator\.

## Appendix BData Fidelity, Scale Ambiguity, and Energy\-Ratio Correction

This appendix gives the full form of the data\-fidelity term𝒟\\mathcal\{D\}used in Eq\. \([3](https://arxiv.org/html/2605.13988#S3.E3)\), records the formal proof that the pre\-normalization scale ofρ\\rhois unidentifiable from𝒟\\mathcal\{D\}alone, and shows how an energy\-ratio anchor restores identifiability\. The proof is operator\-agnostic and applies to any reconstruction method that minimizes Eq\. \([3](https://arxiv.org/html/2605.13988#S3.E3)\); the specific scale\-correction*post\-processing step*adopted by NeTMY is described in Section[4](https://arxiv.org/html/2605.13988#S4)and Appendix[D\.5](https://arxiv.org/html/2605.13988#A4.SS5)\.

### B\.1Pixelwise log\-MSE form of𝒟\\mathcal\{D\}

WithN​\(𝐫;S\)=∑ω∈WS​\(ω,𝐫\)N\(\\mathbf\{r\};S\)=\\sum\_\{\\omega\\in W\}S\(\\omega,\\mathbf\{r\}\),N^​\(𝐫;S\)=N​\(𝐫;S\)/max𝐫′⁡N​\(𝐫′;S\)\\widehat\{N\}\(\\mathbf\{r\};S\)=N\(\\mathbf\{r\};S\)/\\max\_\{\\mathbf\{r\}^\{\\prime\}\}N\(\\mathbf\{r\}^\{\\prime\};S\), andℓ​\(𝐫;S\)=log10⁡\(N^​\(𝐫;S\)\+10−10\)\\ell\(\\mathbf\{r\};S\)=\\log\_\{10\}\(\\widehat\{N\}\(\\mathbf\{r\};S\)\+10^\{\-10\}\), the data\-fidelity discrepancy in Eq\. \([3](https://arxiv.org/html/2605.13988#S3.E3)\) is

𝒟​\(ℱ​\(ρ,ωL\),Sobs\)=1\|Ω\|​∑𝐫∈Ω\(ℓ​\(𝐫;ℱ​\(ρ,ωL\)\)−ℓ​\(𝐫;Sobs\)\)2,\\mathcal\{D\}\\bigl\(\\mathcal\{F\}\(\\rho,\\omega\_\{L\}\),S\_\{\\mathrm\{obs\}\}\\bigr\)\\;=\\;\\frac\{1\}\{\|\\Omega\|\}\\sum\_\{\\mathbf\{r\}\\in\\Omega\}\\Bigl\(\\ell\\\!\\bigl\(\\mathbf\{r\};\\mathcal\{F\}\(\\rho,\\omega\_\{L\}\)\\bigr\)\\;\-\\;\\ell\\bigl\(\\mathbf\{r\};S\_\{\\mathrm\{obs\}\}\\bigr\)\\Bigr\)^\{2\},\(19\)where\|Ω\|\|\\Omega\|denotes the number of grid pixels inΩ\\Omega\. The two preprocessing steps \(frequency summation followed by spatial max\-normalization\) are standard in NV relaxometry to suppress per\-sample amplitude variation and emphasize spatial\-distribution mismatch\[[41](https://arxiv.org/html/2605.13988#bib.bib27),[51](https://arxiv.org/html/2605.13988#bib.bib23),[11](https://arxiv.org/html/2605.13988#bib.bib6)\]; the small additive constant10−1010^\{\-10\}avoids numerical singularity on background pixels\.

### B\.2Setup: max\-normalization induces a scale\-flat ray

LetSobsS\_\{\\mathrm\{obs\}\}be the noise spectrum produced by an unknown ground\-truth pair\(ρ⋆,ωL,⋆\)\(\\rho\_\{\\star\},\\omega\_\{L,\\star\}\)under the linear operatorℱ2\\mathcal\{F\}\_\{2\}, and letN^​\(⋅;S\)\\widehat\{N\}\(\\,\\cdot\\,;S\)denote its frequency\-summed, max\-normalized noise map \(Section[3\.2](https://arxiv.org/html/2605.13988#S3.SS2)\)\. Linearity givesℱ2​\(c​ρ,ωL\)=c​ℱ2​\(ρ,ωL\)\\mathcal\{F\}\_\{2\}\(c\\rho,\\omega\_\{L\}\)=c\\,\\mathcal\{F\}\_\{2\}\(\\rho,\\omega\_\{L\}\)for anyc\>0c\>0, so

N^​\(⋅;ℱ2​\(c​ρ,ωL\)\)=N^​\(⋅;ℱ2​\(ρ,ωL\)\)∀c\>0\.\\widehat\{N\}\(\\,\\cdot\\,;\\mathcal\{F\}\_\{2\}\(c\\rho,\\omega\_\{L\}\)\)\\;=\\;\\widehat\{N\}\(\\,\\cdot\\,;\\mathcal\{F\}\_\{2\}\(\\rho,\\omega\_\{L\}\)\)\\qquad\\forall\\,c\>0\.\(20\)The fidelity term𝒟\\mathcal\{D\}in Eq\. \([19](https://arxiv.org/html/2605.13988#A2.E19)\) is therefore exactly invariant along the ray\{c​ρ:c\>0\}\\\{c\\rho:c\>0\\\}, and the data alone cannot identify the absolute density scale\. This is a finite\-dimensional instance of the bilinear\-identifiability symptom analyzed in compressed\-sensing and blind\-deconvolution literature\[[2](https://arxiv.org/html/2605.13988#bib.bib37),[44](https://arxiv.org/html/2605.13988#bib.bib43),[42](https://arxiv.org/html/2605.13988#bib.bib42)\]\.

### B\.3Energy\-ratio correction is exact underℱ2\\mathcal\{F\}\_\{2\}in the noiseless limit

###### Proposition 1\(Scale correction underℱ2\\mathcal\{F\}\_\{2\}\)\.

Letℱ\\mathcal\{F\}be linear inρ\\rhoand assume the noiseless observationSobs=ℱ​\(ρ⋆,ωL,⋆\)S\_\{\\mathrm\{obs\}\}=\\mathcal\{F\}\(\\rho\_\{\\star\},\\omega\_\{L,\\star\}\)\. Letρ^\\widehat\{\\rho\}be a candidate reconstruction with the same shape asρ⋆\\rho\_\{\\star\}up to scale,ρ^=c−1​ρ⋆\\widehat\{\\rho\}=c^\{\-1\}\\rho\_\{\\star\}for some unknownc\>0c\>0\. Define

Eobs=∑ω,𝐫Sobs​\(ω,𝐫\),Epred=∑ω,𝐫ℱ​\(ρ^,ωL,⋆\)​\(ω,𝐫\),E\_\{\\mathrm\{obs\}\}\\;=\\;\\sum\_\{\\omega,\\mathbf\{r\}\}S\_\{\\mathrm\{obs\}\}\(\\omega,\\mathbf\{r\}\),\\qquad E\_\{\\mathrm\{pred\}\}\\;=\\;\\sum\_\{\\omega,\\mathbf\{r\}\}\\mathcal\{F\}\(\\widehat\{\\rho\},\\omega\_\{L,\\star\}\)\(\\omega,\\mathbf\{r\}\),andα=Eobs/Epred\\alpha=E\_\{\\mathrm\{obs\}\}/E\_\{\\mathrm\{pred\}\}\. Thenα​ρ^=ρ⋆\\alpha\\widehat\{\\rho\}=\\rho\_\{\\star\}exactly, andα\\alphadepends only on the observation and the reconstruction\.

###### Proof\.

Linearity andρ^=c−1​ρ⋆\\widehat\{\\rho\}=c^\{\-1\}\\rho\_\{\\star\}giveℱ​\(ρ^,ωL,⋆\)=c−1​ℱ​\(ρ⋆,ωL,⋆\)\\mathcal\{F\}\(\\widehat\{\\rho\},\\omega\_\{L,\\star\}\)=c^\{\-1\}\\mathcal\{F\}\(\\rho\_\{\\star\},\\omega\_\{L,\\star\}\)pointwise\. HenceEpred=c−1​EobsE\_\{\\mathrm\{pred\}\}=c^\{\-1\}E\_\{\\mathrm\{obs\}\}, soα=Eobs/Epred=c\\alpha=E\_\{\\mathrm\{obs\}\}/E\_\{\\mathrm\{pred\}\}=candα​ρ^=ρ⋆\\alpha\\widehat\{\\rho\}=\\rho\_\{\\star\}\. ∎

With additive sensor noiseSobs=ℱ​\(ρ⋆,ωL,⋆\)\+εS\_\{\\mathrm\{obs\}\}=\\mathcal\{F\}\(\\rho\_\{\\star\},\\omega\_\{L,\\star\}\)\+\\varepsilon, writeEε=∑ω,𝐫ε​\(ω,𝐫\)E\_\{\\varepsilon\}=\\sum\_\{\\omega,\\mathbf\{r\}\}\\varepsilon\(\\omega,\\mathbf\{r\}\)andE⋆=∑ω,𝐫ℱ​\(ρ⋆,ωL,⋆\)E\_\{\\star\}=\\sum\_\{\\omega,\\mathbf\{r\}\}\\mathcal\{F\}\(\\rho\_\{\\star\},\\omega\_\{L,\\star\}\)\. Thenα=c​\(1\+Eε/E⋆\)\\alpha=c\\,\(1\+E\_\{\\varepsilon\}/E\_\{\\star\}\), so no deterministic exactness is claimed; if𝔼​ε=0\\mathbb\{E\}\\varepsilon=0, the correction is unbiased in expectation and its relative fluctuation isO​\(σε​\|W\|​\|Ω\|/E⋆\)O\(\\sigma\_\{\\varepsilon\}\\sqrt\{\|W\|\\,\|\\Omega\|\}/E\_\{\\star\}\)\. A scale\-correction regularizer that penalizes deviation ofα\\alphafrom11\(or, equivalently, anchorsρ\\rhoto the observed energy after every step\) restores identifiability without any oracle quantity\. Two further caveats are worth recording independently of method\. \(i\) The proof requires linearity ofℱ\\mathcal\{F\}inρ\\rho, which holds forℱ2\\mathcal\{F\}\_\{2\}but notℱ1\\mathcal\{F\}\_\{1\}\(handled below\)\. \(ii\) The shape assumptionρ^=c−1​ρ⋆\\widehat\{\\rho\}=c^\{\-1\}\\rho\_\{\\star\}is exact only when the reconstruction recovers the true support; in practice we observe graceful degradation and a residual scale bias proportional to the shape error\.

### B\.4Square\-root correction underℱ1\\mathcal\{F\}\_\{1\}

For the quadratic operatorℱ1​\(c​ρ,ωL\)=c2​ℱ1​\(ρ,ωL\)\\mathcal\{F\}\_\{1\}\(c\\rho,\\omega\_\{L\}\)=c^\{2\}\\mathcal\{F\}\_\{1\}\(\\rho,\\omega\_\{L\}\), the same noiseless calculation withρ^=c−1​ρ⋆\\widehat\{\\rho\}=c^\{\-1\}\\rho\_\{\\star\}givesEpred=c−2​EobsE\_\{\\mathrm\{pred\}\}=c^\{\-2\}E\_\{\\mathrm\{obs\}\}, so the physically matched closed form is

c1=Eobs/Epred,c2=Eobs/Epred,c\_\{1\}\\;=\\;\\sqrt\{\\,E\_\{\\mathrm\{obs\}\}/E\_\{\\mathrm\{pred\}\}\\,\},\\qquad c\_\{2\}\\;=\\;E\_\{\\mathrm\{obs\}\}/E\_\{\\mathrm\{pred\}\},\(21\)i\.e\., theℱ1\\mathcal\{F\}\_\{1\}\-form correction is the square root of theℱ2\\mathcal\{F\}\_\{2\}\-form correction in the noiseless limit\. The two forms agree only at the trivial fixed pointEobs=EpredE\_\{\\mathrm\{obs\}\}=E\_\{\\mathrm\{pred\}\}\. We use the operator\-matched convention in the main\-body tables:c1c\_\{1\}forℱ1\\mathcal\{F\}\_\{1\}andc2c\_\{2\}forℱ2\\mathcal\{F\}\_\{2\}; numerically, the asymmetry is small for the localization\-oriented metrics \(GMSD, Hungarian F1, Sliced\-WD\) reported in Section[5](https://arxiv.org/html/2605.13988#S5)because none of them depend on the absolute density scale, but it does affect density MSE\.

## Appendix CIll\-posedness Diagnostics

This appendix gives the formal counterparts of the four ill\-posedness statements \(P1\)–\(P4\) in Section[3\.3](https://arxiv.org/html/2605.13988#S3.SS3), and quantifies the resulting forward\-Jacobian and gradient structure that motivates the Method of Section[4](https://arxiv.org/html/2605.13988#S4)\.

### C\.1\(P1\) Exponential frequency suppression

###### Lemma 1\(Frequency decay of the tensor channel power\)\.

Forz0\>0z\_\{0\}\>0anda∈\{x,y,z\}a\\in\\\{x,y,z\\\}, the 2D Fourier transform of the source\-plane Green\-tensor channel evaluated at standoffz0z\_\{0\}satisfies\|Ga​z^​\(𝐤;z0\)\|=O​\(k​e−k​z0\)\|\\widehat\{G\_\{az\}\}\(\\mathbf\{k\};z\_\{0\}\)\|=O\(k\\,e^\{\-kz\_\{0\}\}\)fork=‖𝐤‖k=\\left\\lVert\\mathbf\{k\}\\right\\rVert\. Consequently, for fixedz0z\_\{0\}there exist constantsCa,z0C\_\{a,z\_\{0\}\}andm<∞m<\\inftysuch that

\|\|Ga​z\|2^​\(𝐤;z0\)\|≤Ca,z0​\(1\+k\)m​e−k​z0,\\bigl\|\\widehat\{\\,\\left\|G\_\{az\}\\right\|^\{2\}\\,\}\(\\mathbf\{k\};z\_\{0\}\)\\bigr\|\\;\\leq\\;C\_\{a,z\_\{0\}\}\\,\(1\+k\)^\{m\}\\,e^\{\-kz\_\{0\}\},\(22\)so the linear forward mapρ↦∑a\|Ga​z\|2∗ρ\\rho\\mapsto\\sum\_\{a\}\|G\_\{az\}\|^\{2\}\\\!\\ast\\rhosuppresses spatial mode𝐤\\mathbf\{k\}with exponential envelopee−k​z0e^\{\-kz\_\{0\}\}up to algebraic factors inkk\.

###### Proof\.

LetΦ​\(𝐫,z\)=\(4​π\)−1​\(‖𝐫‖2\+z2\)−1/2\\Phi\(\\mathbf\{r\},z\)=\(4\\pi\)^\{\-1\}\(\\left\\lVert\\mathbf\{r\}\\right\\rVert^\{2\}\+z^\{2\}\)^\{\-1/2\}be the half\-space magnetostatic scalar potential\. Its 2D Fourier transform in𝐫\\mathbf\{r\}is, up to convention constants,Φ^​\(𝐤,z\)=k−1​e−k​z\\widehat\{\\Phi\}\(\\mathbf\{k\},z\)=k^\{\-1\}e^\{\-kz\}\. Each componentGi​aG\_\{ia\}is a second derivative ofΦ\\Phiin real space; Fourier differentiation contributes factorsi​kiik\_\{i\}or−k\-k, hence\|Ga​z^​\(𝐤;z0\)\|≤Ca​k​e−k​z0\|\\widehat\{G\_\{az\}\}\(\\mathbf\{k\};z\_\{0\}\)\|\\leq C\_\{a\}\\,k\\,e^\{\-kz\_\{0\}\}for a constantCaC\_\{a\}\. Since\|Ga​z\|2\|G\_\{az\}\|^\{2\}is the pointwise square of a real\-valued function, its 2D Fourier transform is the autoconvolution\|Ga​z\|2^=\(2​π\)−2​Ga​z^∗Ga​z¯^\\widehat\{\|G\_\{az\}\|^\{2\}\}=\(2\\pi\)^\{\-2\}\\widehat\{G\_\{az\}\}\\\!\\ast\\\!\\widehat\{\\overline\{G\_\{az\}\}\}\. Therefore

\|\|Ga​z\|2^​\(𝐤\)\|≤C​∫ℝ2‖𝐪‖​‖𝐤−𝐪‖​e−z0​\(‖𝐪‖\+‖𝐤−𝐪‖\)​𝑑𝐪≤Cz0​\(1\+k\)m​e−k​z0,\\bigl\|\\widehat\{\|G\_\{az\}\|^\{2\}\}\(\\mathbf\{k\}\)\\bigr\|\\;\\leq\\;C\\\!\\int\_\{\\mathbb\{R\}^\{2\}\}\\\!\\left\\lVert\\mathbf\{q\}\\right\\rVert\\,\\left\\lVert\\mathbf\{k\}\-\\mathbf\{q\}\\right\\rVert\\,e^\{\-z\_\{0\}\(\\left\\lVert\\mathbf\{q\}\\right\\rVert\+\\left\\lVert\\mathbf\{k\}\-\\mathbf\{q\}\\right\\rVert\)\}\\,d\\mathbf\{q\}\\;\\leq\\;C\_\{z\_\{0\}\}\\,\(1\+k\)^\{m\}\\,e^\{\-kz\_\{0\}\},where the second inequality uses the triangle bound‖𝐪‖\+‖𝐤−𝐪‖≥k\\left\\lVert\\mathbf\{q\}\\right\\rVert\+\\left\\lVert\\mathbf\{k\}\-\\mathbf\{q\}\\right\\rVert\\geq kand bounds the residual integral by an algebraic factor inkkfor fixedz0z\_\{0\}\. ∎

The implication for optimization is that the eigenvalues of the linearized forward map drop off geometrically across spatial scales, making the inverse map exponentially ill\-conditioned at highkk\. We exploit this prediction in two concrete ways: \(a\) we use a coordinate neural field with annealed positional encoding so that low\-kkmodes are fit before high\-kkmodes \(Section[4](https://arxiv.org/html/2605.13988#S4)\), and \(b\) we predict \(and measure\) a measurable degradation when the curriculum order is reversed\.

### C\.2\(P2\) Finite\-window center bias

On a bounded sensing windowΩ\\Omega, the convolutional footprint of a single source pixel at𝐫0\\mathbf\{r\}\_\{0\}underℱ2\\mathcal\{F\}\_\{2\}is the translated channel\-summed power kernel∑a\|Ga​z\(⋅−𝐫0\)\|2\\sum\_\{a\}\\left\|G\_\{az\}\(\\,\\cdot\\,\-\\mathbf\{r\}\_\{0\}\)\\right\|^\{2\}, whose energy onΩ\\Omegais maximized when𝐫0\\mathbf\{r\}\_\{0\}is at the window center and decreases monotonically as𝐫0\\mathbf\{r\}\_\{0\}approaches the boundary, because the translated footprint is increasingly truncated by∂Ω\\partial\\Omega\. The forward Jacobian column for source pixel𝐫0\\mathbf\{r\}\_\{0\}has squared norm

‖∂ℱ2∂ρ​\(𝐫0\)‖2=∫W∫Ω\(∑a∈\{x,y,z\}\|Ga​z​\(𝐫−𝐫0\)\|2\)2​L​\(ω;ωL​\(𝐫\)\)2​𝑑𝐫​𝑑ω,\\left\\lVert\\frac\{\\partial\\mathcal\{F\}\_\{2\}\}\{\\partial\\rho\(\\mathbf\{r\}\_\{0\}\)\}\\right\\rVert^\{2\}\\;=\\;\\int\_\{W\}\\\!\\int\_\{\\Omega\}\\biggl\(\\sum\_\{a\\in\\\{x,y,z\\\}\}\\left\|G\_\{az\}\(\\mathbf\{r\}\-\\mathbf\{r\}\_\{0\}\)\\right\|^\{2\}\\biggr\)^\{2\}\\,L\(\\omega;\\omega\_\{L\}\(\\mathbf\{r\}\)\)^\{2\}\\,d\\mathbf\{r\}\\,d\\omega,\(23\)which expands into auto\-channel terms∑a\|Ga​z\|4\\sum\_\{a\}\\left\|G\_\{az\}\\right\|^\{4\}plus positive cross\-channel terms∑a≠a′\|Ga​z\|2​\|Ga′​z\|2\\sum\_\{a\\neq a^\{\\prime\}\}\\left\|G\_\{az\}\\right\|^\{2\}\\left\|G\_\{a^\{\\prime\}z\}\\right\|^\{2\}\. Since the integrand is a nonnegative localized footprint, truncation by∂Ω\\partial\\Omegapreserves the center\-vs\-edge asymmetry\. Even from a uniform initializationρ0≡c\\rho\_\{0\}\\equiv c, the iter\-0gradient of the data\-fidelity term therefore points more strongly toward central pixels than toward boundary pixels, before any structure has formed\. We measure this directly in Section[5](https://arxiv.org/html/2605.13988#S5): the iter\-0physics\-loss gradient underℱ2\\mathcal\{F\}\_\{2\}has a center\-to\-outer ratio of18\.29×18\.29\\timeson a representative sample, with the peak at the grid center, even though the ground\-truth density is off\-center\. \(P2\) is therefore a bias of the operator and the window, not a bug of any one solver\.

### C\.3\(P3\) Max\-normalization peak coupling

The data\-fidelity term in Eq\. \([19](https://arxiv.org/html/2605.13988#A2.E19)\) compares max\-normalized noise mapsN^i=Ni/M\\widehat\{N\}\_\{i\}=N\_\{i\}/M,M=maxj⁡NjM=\\max\_\{j\}N\_\{j\}\. DifferentiatingN^\\widehat\{N\}with respect to the unnormalizedNN:

∂N^i∂Nj=δi​jM−NiM2​∂M∂Nj,∂M∂Nj=δj​kwherek=arg⁡maxj⁡Nj\.\\frac\{\\partial\\widehat\{N\}\_\{i\}\}\{\\partial N\_\{j\}\}\\;=\\;\\frac\{\\delta\_\{ij\}\}\{M\}\\;\-\\;\\frac\{N\_\{i\}\}\{M^\{2\}\}\\,\\frac\{\\partial M\}\{\\partial N\_\{j\}\},\\qquad\\frac\{\\partial M\}\{\\partial N\_\{j\}\}\\;=\\;\\delta\_\{jk\}\\quad\\text\{where\}\\quad k=\\arg\\max\_\{j\}N\_\{j\}\.\(24\)For a non\-peak pixel \(j≠kj\\neq k\) this reduces toδi​j/M\\delta\_\{ij\}/M\(purely local\), but for the peak pixel \(j=kj=k\) Eq\. \([24](https://arxiv.org/html/2605.13988#A3.E24)\) contains an extra term−Ni/M2\-N\_\{i\}/M^\{2\}that couples the gradient atNkN\_\{k\}to*every*pixeliiwhereNiN\_\{i\}is appreciable, with weight proportional toNiN\_\{i\}\. The chain\-rule combination of Eq\. \([24](https://arxiv.org/html/2605.13988#A3.E24)\) with the \(P2\) center bias amplifies the peak whenever the current peak is at the window center, which is exactly the iter\-0situation produced by \(P2\) on a uniform initialization\. The qualitative consequence is that free\-density methods \(Tikhonov, ADMM\) following∇ρ𝒟\\nabla\_\{\\rho\}\\mathcal\{D\}literally execute this center\-amplifying step at iter0, and once a centered peak forms it self\-reinforces through Eq\. \([24](https://arxiv.org/html/2605.13988#A3.E24)\)\. We document the resulting energy\-barrier between the centered\-collapse minimum and the ground\-truth density in Section[5](https://arxiv.org/html/2605.13988#S5)\.

### C\.4\(P4\) Source merging and joint\(ρ,ωL\)\(\\rho,\\omega\_\{L\}\)ambiguity

The effective point\-spread widthwpsfw\_\{\\mathrm\{psf\}\}ofℱ2\\mathcal\{F\}\_\{2\}is set by the standoffz0z\_\{0\}through the kernel decay of Lemma[1](https://arxiv.org/html/2605.13988#Thmlemma1): each tensor channel\|Ga​z\|2\|G\_\{az\}\|^\{2\}has spatial extent∼z0\\sim z\_\{0\}on the source side, so two sources separated byΔ​r<wpsf∼z0\\Delta r<w\_\{\\mathrm\{psf\}\}\\sim z\_\{0\}produce nearly identical forward responses and cannot be disambiguated from the data alone\. A simple lower\-bound estimate\[[23](https://arxiv.org/html/2605.13988#bib.bib10)\]useful for our setup is

wpsf≈2​z0​log⁡2/\(k​z0\+3\)at characteristic mode​k,w\_\{\\mathrm\{psf\}\}\\;\\approx\\;2z\_\{0\}\\sqrt\{\\,\\log 2\\,/\\,\(kz\_\{0\}\+3\)\\,\}\\quad\\text\{at characteristic mode\}\\;k,\(25\)which we use as a working threshold to label "ill\-conditioned" data regimes in Section[5](https://arxiv.org/html/2605.13988#S5)\. Beyond pairwise merging, in many\-source scenes a continuous family of densities can fit the same forward measurement to within noise, producing axis\-aligned cross\-shaped artifacts when the optimizer settles on a low\-cost, anisotropic explanation\[[83](https://arxiv.org/html/2605.13988#bib.bib11)\]\.

A separate non\-uniqueness arises from the joint role ofρ\\rhoandωL\\omega\_\{L\}in Eq\. \([2](https://arxiv.org/html/2605.13988#S3.E2)\): at any pixel𝐫\\mathbf\{r\}where the source contribution∑a\|Ga​z\|2∗ρ​\(𝐫\)\\sum\_\{a\}\|G\_\{az\}\|^\{2\}\\\!\\ast\\rho\(\\mathbf\{r\}\)is small, the LorentzianL​\(ω;ωL​\(𝐫\)\)L\(\\omega;\\omega\_\{L\}\(\\mathbf\{r\}\)\)multiplies a near\-zero coefficient and the fidelity term places no constraint onωL​\(𝐫\)\\omega\_\{L\}\(\\mathbf\{r\}\)\. The Larmor map is therefore identifiable only on the support ofρ\\rho\. NeTMY enforces this scope at the architectural level \(Appendix[D\.1](https://arxiv.org/html/2605.13988#A4.SS1), Eq\. \([26](https://arxiv.org/html/2605.13988#A4.E26)\)\): the predicted Larmor field is multiplicatively masked by an indicator on the predicted support ofρθ\\rho\_\{\\theta\}, so gradients flow intoωL​\(𝐫\)\\omega\_\{L\}\(\\mathbf\{r\}\)only whereρθ​\(𝐫\)\\rho\_\{\\theta\}\(\\mathbf\{r\}\)is appreciable\. Without such gating, any local minimum acquires arbitrary Larmor values on the background, contaminating the per\-pixel loss landscape and slowing convergence on the support\.

### C\.5Summary: which pathology drives which experiment

Table[4](https://arxiv.org/html/2605.13988#A3.T4)maps each ill\-posedness pathology to the empirical signature it predicts and to the experimental section that reports the measurement\.

PathologyPredicted signatureReported inMitigated by\(P1\) Frequency suppressionCoarse\-to\-fine\>\>fine\-to\-coarse curriculumSec\.[5](https://arxiv.org/html/2605.13988#S5), ablationSec\.[4](https://arxiv.org/html/2605.13988#S4), annealed PE\(P2\) Window center biasIter\-0centered gradientSec\.[5](https://arxiv.org/html/2605.13988#S5), mechanismSec\.[4](https://arxiv.org/html/2605.13988#S4), parameterization\(P3\) Max\-norm peak couplingEnergy barrier at center collapseSec\.[5](https://arxiv.org/html/2605.13988#S5), mechanismSec\.[4](https://arxiv.org/html/2605.13988#S4), gating \+ PE\(P4\) PSF merging, Larmor ambiguityCross artifacts, background Larmor noiseSec\.[5](https://arxiv.org/html/2605.13988#S5), failureSec\.[4](https://arxiv.org/html/2605.13988#S4), masked Larmor headTable 4:Mapping from ill\-posedness pathology to empirical signature, the experimental section that reports it, and the design choice in Section[4](https://arxiv.org/html/2605.13988#S4)that mitigates it\.

## Appendix DMethod Details

This appendix gives the full functional and algorithmic details deferred from Section[4](https://arxiv.org/html/2605.13988#S4): the architecture of the coordinate MLP and density/Larmor heads, the frequency\-annealing schedule, the per\-stage training schedule, the explicit forms of all loss terms, the post\-training scale correction, and the formal derivation of the Jacobian\-induced filtering relationΔ​ρ≈−η​Gθ​∇ρℒ\\Delta\\rho\\approx\-\\eta\\,G\_\{\\theta\}\\nabla\_\{\\rho\}\\mathcal\{L\}\.

### D\.1Architecture and Output Heads

The coordinate fieldfθf\_\{\\theta\}in Eq\. \([4](https://arxiv.org/html/2605.13988#S4.E4)\) is realized by the networkfθ​\(𝐱\)=Heads​\(MLPθ​\(γβ​\(𝐱\)\)\)f\_\{\\theta\}\(\\mathbf\{x\}\)=\\mathrm\{Heads\}\\bigl\(\\mathrm\{MLP\}\_\{\\theta\}\(\\gamma\_\{\\beta\}\(\\mathbf\{x\}\)\)\\bigr\), where𝐱=\(x,y\)∈Ω\\mathbf\{x\}=\(x,y\)\\in\\Omega,γβ\\gamma\_\{\\beta\}is the annealed Fourier encoder \(Appendix[D\.2](https://arxiv.org/html/2605.13988#A4.SS2)\),MLPθ\\mathrm\{MLP\}\_\{\\theta\}is a 6\-layer fully connected backbone, andHeads\\mathrm\{Heads\}is the two\-output transform of Eq\. \([5](https://arxiv.org/html/2605.13988#S4.E5)\) and Eq\. \([26](https://arxiv.org/html/2605.13988#A4.E26)\)\. Coordinates are normalized to the cube\[−1,1\]2\[\-1,1\]^\{2\}before encoding so that the Fourier basis frequencies are device\-agnostic\. Table[5](https://arxiv.org/html/2605.13988#A4.T5)summarizes the architectural choices used in every reported run\.

ComponentSettingCoordinate input\(x,y\)∈\[−1,1\]2\(x,y\)\\in\[\-1,1\]^\{2\}, normalized by half grid extentPositional encodingannealed Fourier features,K=12K=12octaves, base 2Encoded input dim2\+2⋅2⋅K=502\+2\\cdot 2\\cdot K=50Backbone6 fully connected layers, hidden320320,tanh\\tanhactivationSkip connectionencoded input re\-injected at layer 3Output channels33: density logithρh\_\{\\rho\}, density gategg, Larmor logithωh\_\{\\omega\}Density headρθ=softplus​\(hρ\)​σ​\(g\)\\rho\_\{\\theta\}=\\mathrm\{softplus\}\(h\_\{\\rho\}\)\\,\\sigma\(g\)Larmor headωL,θ=ωL,min\+\(ωL,max−ωL,min\)​σ​\(hω\)\\omega\_\{L,\\theta\}=\\omega\_\{L,\\min\}\+\(\\omega\_\{L,\\max\}\-\\omega\_\{L,\\min\}\)\\,\\sigma\(h\_\{\\omega\}\), with mask in Eq\. \([26](https://arxiv.org/html/2605.13988#A4.E26)\)Parameter count≈4\.5×105\\approx\\\!4\.5\\times 10^\{5\}OptimizerAdamW, weight decay10−410^\{\-4\}, gradient clip‖∇θ‖≤1\\left\\lVert\\nabla\\theta\\right\\rVert\\leq 1Learning\-rate schedulecosine annealing per stage to0\.01⋅ηbase0\.01\\cdot\\eta\_\{\\mathrm\{base\}\}Table 5:NeTMY architectural and optimization settings\. All reported main\-body and ablation runs use these defaults except where the ablation explicitly toggles a component\.The density\-head transform of Eq\. \([5](https://arxiv.org/html/2605.13988#S4.E5)\) is a gated softplus: the softplus enforces nonnegativity, and the sigmoid gateσ​\(g\)∈\(0,1\)\\sigma\(g\)\\in\(0,1\)acts as a pixelwise on/off switch that the optimizer can drive close to zero without saturating the softplus\. Empirically, removing the gate \(i\.e\.,ρθ=softplus​\(hρ\)\\rho\_\{\\theta\}=\\mathrm\{softplus\}\(h\_\{\\rho\}\)\) makes background suppression substantially harder and weakens sparse\-localization metrics \(Section[5](https://arxiv.org/html/2605.13988#S5), ablation\)\.

The Larmor head is sigmoid\-mapped into the device\-determined band and then multiplied by a hard predicted\-support mask,

ωL,θmask​\(𝐫\)=mθ​\(𝐫\)​ωL,θband​\(𝐫\),mθ​\(𝐫\)=1​\{ρθ​\(𝐫\)\>τ​max𝐫′⁡ρθ​\(𝐫′\)\},τ=0\.3,\\omega^\{\\mathrm\{mask\}\}\_\{L,\\theta\}\(\\mathbf\{r\}\)\\;=\\;m\_\{\\theta\}\(\\mathbf\{r\}\)\\,\\omega^\{\\mathrm\{band\}\}\_\{L,\\theta\}\(\\mathbf\{r\}\),\\qquad m\_\{\\theta\}\(\\mathbf\{r\}\)\\;=\\;\\mathbf\{1\}\\bigl\\\{\\rho\_\{\\theta\}\(\\mathbf\{r\}\)\>\\tau\\,\\max\_\{\\mathbf\{r\}^\{\\prime\}\}\\rho\_\{\\theta\}\(\\mathbf\{r\}^\{\\prime\}\)\\bigr\\\},\\qquad\\tau=0\.3,\(26\)withmθm\_\{\\theta\}treated as a stop\-gradient constant during back\-prop\. The nondifferentiable indicator is therefore not differentiated, and the Larmor logit receives gradients only wheremθ=1m\_\{\\theta\}=1\. Pixels withmθ=0m\_\{\\theta\}=0are assigned the fixed value0, which lies outside\[ωL,min,ωL,max\]\[\\omega\_\{L,\\min\},\\omega\_\{L,\\max\}\]but is not interpreted as a physical Larmor frequency: sinceρθ≃0\\rho\_\{\\theta\}\\simeq 0there, support\-weightedℓ2\\ell\_\{2\}losses and metrics onωL\\omega\_\{L\}ignore those pixels\. This is the operational counterpart of the \(P4\) identifiability fact \(Appendix[C\.4](https://arxiv.org/html/2605.13988#A3.SS4)\) that the data constrainsωL\\omega\_\{L\}only on the support ofρ\\rho: without the mask, the gradient throughωL\\omega\_\{L\}on the background is a near\-zero coefficient that nonetheless steers the optimizer toward arbitrary Larmor values, contaminating the per\-pixel loss landscape\.

### D\.2Annealed Fourier Features

Following the Nerfies frequency\-annealing schedule\[[58](https://arxiv.org/html/2605.13988#bib.bib68),[72](https://arxiv.org/html/2605.13988#bib.bib20)\], the encoded input is

γβ​\(𝐱\)\\displaystyle\\gamma\_\{\\beta\}\(\\mathbf\{x\}\)=\(𝐱,\{wk​\(β\)​sin⁡\(2k​π​𝐱\),wk​\(β\)​cos⁡\(2k​π​𝐱\)\}k=0K−1\),\\displaystyle\\;=\\;\\Bigl\(\\,\\mathbf\{x\},\\;\\bigl\\\{\\,w\_\{k\}\(\\beta\)\\,\\sin\(2^\{k\}\\pi\\,\\mathbf\{x\}\),\\;w\_\{k\}\(\\beta\)\\,\\cos\(2^\{k\}\\pi\\,\\mathbf\{x\}\)\\,\\bigr\\\}\_\{k=0\}^\{K\-1\}\\Bigr\),\(27\)wk​\(β\)\\displaystyle w\_\{k\}\(\\beta\)=1−cos⁡\(π​clip​\(β−k,0,1\)\)2,\\displaystyle\\;=\\;\\frac\{1\-\\cos\\\!\\bigl\(\\pi\\,\\mathrm\{clip\}\(\\beta\-k,0,1\)\\bigr\)\}\{2\},whereK=12K=12andβ∈\[0,K\]\\beta\\in\[0,K\]is a per\-stage progress scalar\. We scheduleβ\\betalinearly within each stage,β​\(t\)=K⋅t/T\\beta\(t\)=K\\cdot t/T, withttthe within\-stage epoch andTTthe stage length\. Atβ=0\\beta=0only the linear coordinate input contributes; atβ=K\\beta=KallKKoctaves are fully active and Eq\. \([27](https://arxiv.org/html/2605.13988#A4.E27)\) reduces to the standard Fourier feature encoding of\[[72](https://arxiv.org/html/2605.13988#bib.bib20)\]\. The cosine envelopewk​\(β\)w\_\{k\}\(\\beta\)activates each band smoothly, avoiding the gradient\-shock that abrupt high\-frequency activation would induce\.

The annealing schedule is the operational counterpart of \(P1\) \(Lemma[1](https://arxiv.org/html/2605.13988#Thmlemma1)\): low spatial frequencies ofρ\\rhocarryΘ​\(1\)\\Theta\(1\)forward signal, while high frequencies are suppressed with envelopee−k​z0e^\{\-kz\_\{0\}\}up to algebraic factors\. Fitting low frequencies first delivers a coarse minimum that is closer to the true support before the high\-frequency capacity of the MLP is unlocked\. We measure this as a coarse\-to\-fine vs\. fine\-to\-coarse curriculum ablation in Section[5](https://arxiv.org/html/2605.13988#S5)\.

### D\.3Multiscale Training Schedule

Optimization runs in two stages on a single MLP \(parameters carried across stages\); only the coordinate grid resolution and the resolution at whichℱ2\\mathcal\{F\}\_\{2\}is evaluated change between stages\. Table[6](https://arxiv.org/html/2605.13988#A4.T6)summarizes the schedule used in every reported NeTMY run\.

StageResolutionEpochs \(default\)Learning rateNotes132×3232\\times 3230003000ηbase=10−3\\eta\_\{\\mathrm\{base\}\}=10^\{\-3\}coarse low\-frequency support recovery264×6464\\times 64700070000\.5⋅ηbase0\.5\\cdot\\eta\_\{\\mathrm\{base\}\}fine high\-frequency refinementTable 6:NeTMY two\-stage multiscale schedule\. Total10,00010\{,\}000epochs per measurement\. Within each stage, learning rate decays under cosine annealing to0\.01⋅ηbase0\.01\\cdot\\eta\_\{\\mathrm\{base\}\}\. Frequency annealingβ\\betais reset to0at the start of each stage so high\-frequency PE bands are re\-introduced from coarse to fine within each resolution\.The same MLP weights are carried across stages with no re\-initialization, so the fitted parameters at the end of Stage 1 act as a structured initialization for Stage 2\. Coordinates are recreated at the new resolution, and the differentiable forward operatorℱ2\\mathcal\{F\}\_\{2\}is evaluated at that resolution using a precomputed FFT kernel cache\. Per\-instance wall\-clock on a single A6000 GPU ranges from273273s \(median\) to∼780\\sim\\\!780s \(untrimmed mean\) over the main512512\-sample benchmark; runtime is reported in Appendix[E\.9](https://arxiv.org/html/2605.13988#A5.SS9)\.

### D\.4Loss Terms

The full per\-measurement objective of Eq\. \([6](https://arxiv.org/html/2605.13988#S4.E6)\) is the sum of a canonical fidelity, two standard regularizers, and two NeTMY\-specific physics losses\. Below we give the explicit form and the role of each term; default loss weights are listed in Table[7](https://arxiv.org/html/2605.13988#A4.T7)\.

- •Spectrum fidelity𝒟\\mathcal\{D\}: the pixelwise log\-MSE on max\-normalized noise maps as defined in Eq\. \([19](https://arxiv.org/html/2605.13988#A2.E19)\)\. This is the sole data\-fidelity term\.
- •Sparsity‖ρθ‖1=∑𝐫\|ρθ​\(𝐫\)\|\\left\\lVert\\rho\_\{\\theta\}\\right\\rVert\_\{1\}=\\sum\_\{\\mathbf\{r\}\}\\left\|\\rho\_\{\\theta\}\(\\mathbf\{r\}\)\\right\|: encourages point\-like sources\.
- •Anisotropic total variationTV​\(ρθ\)\\mathrm\{TV\}\(\\rho\_\{\\theta\}\)as defined after Eq\. \([3](https://arxiv.org/html/2605.13988#S3.E3)\): smooths small\-scale noise without penalizing sparse peaks\.
- •Mean\-normalized noise\-map lossℛnm=MSE​\(Npred/N¯pred,Nobs/N¯obs\)\\mathcal\{R\}\_\{\\mathrm\{nm\}\}=\\mathrm\{MSE\}\\bigl\(N\_\{\\mathrm\{pred\}\}/\\overline\{N\}\_\{\\mathrm\{pred\}\},\\,N\_\{\\mathrm\{obs\}\}/\\overline\{N\}\_\{\\mathrm\{obs\}\}\\bigr\), whereN¯\\overline\{N\}denotes the spatial mean of the frequency\-summed noise map\. This is a scale\-stable companion to𝒟\\mathcal\{D\}that anchors gradients on the support without the max\-normalization peak coupling of \(P3\)\. In the64×6464\{\\times\}64stage we replace the log\-MSE form of𝒟\\mathcal\{D\}withℛnm\\mathcal\{R\}\_\{\\mathrm\{nm\}\}as the primary fidelity term, since the log\-domain comparison is most useful for early\-stage support discovery\.
- •Direct\-density lossℛds=MSE​\(ρθ2/ρθ2¯,Nobs/N¯obs\)\\mathcal\{R\}\_\{\\mathrm\{ds\}\}=\\mathrm\{MSE\}\\bigl\(\\rho\_\{\\theta\}^\{2\}/\\overline\{\\rho\_\{\\theta\}^\{2\}\},\\,N\_\{\\mathrm\{obs\}\}/\\overline\{N\}\_\{\\mathrm\{obs\}\}\\bigr\): a soft, oracle\-free support regularizer\. The interpretation “spectrum integral∝ρ2\\propto\\rho^\{2\}” is exact only for the coherent point\-source operatorℱ1\\mathcal\{F\}\_\{1\}; under the thermal operatorℱ2\\mathcal\{F\}\_\{2\}used in this paper, the spectrum is linear inρ\\rho, soℛds\\mathcal\{R\}\_\{\\mathrm\{ds\}\}is a heuristic that nudges large predicted density values toward bright observed noise pixels rather than an exact functional identity\. We retain it because the ablation shows that it accelerates support discovery without harmingℱ2\\mathcal\{F\}\_\{2\}inversion\. No ground\-truth density labels enter this term\.

TermWeightRole𝒟\\mathcal\{D\}2\.02\.0spectrum fidelity, log\-MSEℛnm\\mathcal\{R\}\_\{\\mathrm\{nm\}\}0\.50\.5noise\-map fidelity, mean\-normℛds\\mathcal\{R\}\_\{\\mathrm\{ds\}\}0\.10\.1direct\-density proxy, mean\-norm‖ρθ‖1\\left\\lVert\\rho\_\{\\theta\}\\right\\rVert\_\{1\}10−2\+10−310^\{\-2\}\+10^\{\-3\}sparsity \(combined: support \+ L1\)TV​\(ρθ\)\\mathrm\{TV\}\(\\rho\_\{\\theta\}\)10−310^\{\-3\}smoothness on small scalesTable 7:Default loss weights for NeTMY\. The twoℓ1\\ell\_\{1\}entries are listed separately because the codebase implements two L1 contributions with different weights \(one as a "sparsity" prior across both stages, one as an explicit L1 regularizer that can be ablated independently\)\.
### D\.5Energy\-Anchored Scale Correction

After the per\-measurement optimization terminates, the predicted density is rescaled by the energy\-ratio anchor of Proposition[1](https://arxiv.org/html/2605.13988#Thmproposition1)\. For a given inversion operatorℱ∈\{ℱ1,ℱ2\}\\mathcal\{F\}\\in\\\{\\mathcal\{F\}\_\{1\},\\mathcal\{F\}\_\{2\}\\\}, we forward\-evaluate

S^=ℱ​\(ρ^,ωL,θ\)\\widehat\{S\}=\\mathcal\{F\}\(\\widehat\{\\rho\},\\omega\_\{L,\\theta\}\)\(28\)on the final64×6464\{\\times\}64grid and compute the pre\-normalization energies

Epred=∑ω,𝐫S^​\(ω,𝐫\),Eobs=∑ω,𝐫Sobs​\(ω,𝐫\)\.E\_\{\\mathrm\{pred\}\}=\\sum\_\{\\omega,\\mathbf\{r\}\}\\widehat\{S\}\(\\omega,\\mathbf\{r\}\),\\qquad E\_\{\\mathrm\{obs\}\}=\\sum\_\{\\omega,\\mathbf\{r\}\}S\_\{\\mathrm\{obs\}\}\(\\omega,\\mathbf\{r\}\)\.\(29\)We then apply the operator\-matched scale correction

ρ^scaled=αℱ​ρ^,αℱ=\{Eobs/Epred,ℱ=ℱ1,Eobs/Epred,ℱ=ℱ2\.\\widehat\{\\rho\}\_\{\\mathrm\{scaled\}\}=\\alpha\_\{\\mathcal\{F\}\}\\widehat\{\\rho\},\\qquad\\alpha\_\{\\mathcal\{F\}\}=\\begin\{cases\}\\sqrt\{E\_\{\\mathrm\{obs\}\}/E\_\{\\mathrm\{pred\}\}\},&\\mathcal\{F\}=\\mathcal\{F\}\_\{1\},\\\\\[5\.69054pt\] E\_\{\\mathrm\{obs\}\}/E\_\{\\mathrm\{pred\}\},&\\mathcal\{F\}=\\mathcal\{F\}\_\{2\}\.\\end\{cases\}\(30\)The square\-root correction forℱ1\\mathcal\{F\}\_\{1\}follows from the quadratic homogeneityℱ1​\(c​ρ,ωL\)=c2​ℱ1​\(ρ,ωL\)\\mathcal\{F\}\_\{1\}\(c\\rho,\\omega\_\{L\}\)=c^\{2\}\\mathcal\{F\}\_\{1\}\(\\rho,\\omega\_\{L\}\), whereas the linear correction forℱ2\\mathcal\{F\}\_\{2\}follows fromℱ2​\(c​ρ,ωL\)=c​ℱ2​\(ρ,ωL\)\\mathcal\{F\}\_\{2\}\(c\\rho,\\omega\_\{L\}\)=c\\mathcal\{F\}\_\{2\}\(\\rho,\\omega\_\{L\}\); the derivation is given in Appendix[B\.4](https://arxiv.org/html/2605.13988#A2.SS4)\. This correction mainly affects density\-scale\-dependent quantities such as density MSE\. The primary localization\-oriented metrics reported in Section[5](https://arxiv.org/html/2605.13988#S5), including GMSD, Hungarian F1, and Sliced\-WD, are insensitive or only weakly sensitive to the absolute density scale\. Density MSE is therefore reported as a secondary metric\. We apply the correction as a one\-shot post\-processing step rather than as a soft penalty inside Eq\. \([6](https://arxiv.org/html/2605.13988#S4.E6)\)\. This avoids reweighting the𝒟\\mathcal\{D\}\-gradient by a scalar that depends on the current iterate, which would otherwise couple the support\-discovery dynamics to the absolute amplitude estimate\. Within each fixed operator setting, all methods are evaluated with the same operator\-matched scale convention, so comparisons across parameterizations are not affected by method\-specific scale\-correction choices\.

### D\.6Filtering View of the Update

We give the formal version of the filtering identity Eq\. \([7](https://arxiv.org/html/2605.13988#S4.E7)\) that motivates the optimization\-geometry interpretation in Section[4\.4](https://arxiv.org/html/2605.13988#S4.SS4)\.

###### Lemma 2\(Jacobian\-induced filtering of the density update\)\.

Letfθ:Ω→ℝ≥0f\_\{\\theta\}:\\Omega\\to\\mathbb\{R\}\_\{\\geq 0\}be a coordinate field parameterized byθ∈ℝP\\theta\\in\\mathbb\{R\}^\{P\}, writeρθ=fθ\\rho\_\{\\theta\}=f\_\{\\theta\}as a vector inℝ\|Ω\|\\mathbb\{R\}^\{\|\\Omega\|\}, and letℒ​\(ρ\)\\mathcal\{L\}\(\\rho\)be a differentiable scalar loss on density\. A vanilla first\-order step onθ\\thetaat learning rateη\>0\\eta\>0,θt\+1=θt−η​∇θℒ​\(ρθt\)\\theta\_\{t\+1\}=\\theta\_\{t\}\-\\eta\\,\\nabla\_\{\\theta\}\\mathcal\{L\}\(\\rho\_\{\\theta\_\{t\}\}\), induces

Δ​ρ=ρθt\+1−ρθt=−η​Jθ​Jθ⊤​∇ρℒ​\(ρθt\)\+O​\(η2​‖Jθ‖op2​‖∇ρℒ‖2​‖∂θ2fθ‖op\),\\Delta\\rho\\;=\\;\\rho\_\{\\theta\_\{t\+1\}\}\-\\rho\_\{\\theta\_\{t\}\}\\;=\\;\-\\eta\\,J\_\{\\theta\}J\_\{\\theta\}^\{\\top\}\\,\\nabla\_\{\\rho\}\\mathcal\{L\}\(\\rho\_\{\\theta\_\{t\}\}\)\\;\+\\;O\\\!\\bigl\(\\eta^\{2\}\\,\\left\\lVert J\_\{\\theta\}\\right\\rVert\_\{\\mathrm\{op\}\}^\{2\}\\,\\left\\lVert\\nabla\_\{\\rho\}\\mathcal\{L\}\\right\\rVert^\{2\}\\,\\left\\lVert\\partial\_\{\\theta\}^\{2\}f\_\{\\theta\}\\right\\rVert\_\{\\mathrm\{op\}\}\\bigr\),\(31\)whereJθ=∂fθ/∂θ∈ℝ\|Ω\|×PJ\_\{\\theta\}=\\partial f\_\{\\theta\}/\\partial\\theta\\in\\mathbb\{R\}^\{\|\\Omega\|\\times P\}is the parameterization Jacobian and‖∂θ2fθ‖op\\left\\lVert\\partial\_\{\\theta\}^\{2\}f\_\{\\theta\}\\right\\rVert\_\{\\mathrm\{op\}\}is the bilinear operator norm of the second\-derivative tensor offθf\_\{\\theta\}\. The first\-order term is the raw density gradient∇ρℒ\\nabla\_\{\\rho\}\\mathcal\{L\}filtered by the positive semidefinite kernelGθ=Jθ​Jθ⊤∈ℝ\|Ω\|×\|Ω\|G\_\{\\theta\}=J\_\{\\theta\}J\_\{\\theta\}^\{\\top\}\\in\\mathbb\{R\}^\{\|\\Omega\|\\times\|\\Omega\|\}\.

###### Proof\.

Letg=∇ρℒ​\(ρθt\)g=\\nabla\_\{\\rho\}\\mathcal\{L\}\(\\rho\_\{\\theta\_\{t\}\}\)\. Chain rule gives∇θℒ=Jθ⊤​g\\nabla\_\{\\theta\}\\mathcal\{L\}=J\_\{\\theta\}^\{\\top\}g, henceΔ​θ=−η​Jθ⊤​g\\Delta\\theta=\-\\eta\\,J\_\{\\theta\}^\{\\top\}gand‖Δ​θ‖≤η​‖Jθ‖op​‖g‖\\left\\lVert\\Delta\\theta\\right\\rVert\\leq\\eta\\,\\left\\lVert J\_\{\\theta\}\\right\\rVert\_\{\\mathrm\{op\}\}\\,\\left\\lVert g\\right\\rVert\. Taylor expansion offθf\_\{\\theta\}inθ\\thetayieldsρθt\+Δ​θ=ρθt\+Jθ​Δ​θ\+12​∂θ2fθ​\[Δ​θ,Δ​θ\]\+O​\(‖Δ​θ‖3\)\\rho\_\{\\theta\_\{t\}\+\\Delta\\theta\}=\\rho\_\{\\theta\_\{t\}\}\+J\_\{\\theta\}\\Delta\\theta\+\\tfrac\{1\}\{2\}\\,\\partial\_\{\\theta\}^\{2\}f\_\{\\theta\}\[\\Delta\\theta,\\Delta\\theta\]\+O\(\\left\\lVert\\Delta\\theta\\right\\rVert^\{3\}\), which combined with the bound on‖Δ​θ‖2\\left\\lVert\\Delta\\theta\\right\\rVert^\{2\}gives Eq\. \([31](https://arxiv.org/html/2605.13988#A4.E31)\)\. ∎

NeTMY uses AdamW rather than vanilla gradient descent, so this lemma should be read as the leading\-order geometry of an unpreconditioned step; AdamW inserts an adaptive parameter\-space preconditionerPtP\_\{t\}in place of the identity, but the resulting density update is still pushed forward byJθJ\_\{\\theta\}into the same image\-space subspace, so the qualitative filtering picture is preserved\.

Consequence for ill\-posedness\.The filtering kernel

Gθ=Jθ​Jθ⊤G\_\{\\theta\}=J\_\{\\theta\}J\_\{\\theta\}^\{\\top\}\(32\)is positive semidefinite by construction\. It acts as a structurally coupled filtering operator across pixels\. Writing the Jacobian column functions as

ψp​\(𝐫\)=∂fθ​\(𝐫\)∂θp,\\psi\_\{p\}\(\\mathbf\{r\}\)=\\frac\{\\partial f\_\{\\theta\}\(\\mathbf\{r\}\)\}\{\\partial\\theta\_\{p\}\},\(33\)we have, for any raw density gradientg​\(𝐫\)g\(\\mathbf\{r\}\),

\(Gθ​g\)​\(𝐫\)=∑p=1Pψp​\(𝐫\)​⟨ψp,g⟩∈span⁡\{ψ1,…,ψP\}\.\(G\_\{\\theta\}g\)\(\\mathbf\{r\}\)=\\sum\_\{p=1\}^\{P\}\\psi\_\{p\}\(\\mathbf\{r\}\)\\langle\\psi\_\{p\},g\\rangle\\in\\operatorname\{span\}\\\{\\psi\_\{1\},\\ldots,\\psi\_\{P\}\\\}\.\(34\)HenceGθG\_\{\\theta\}maps the raw pixel\-wise gradient into the structured function space allowed by the coordinate parameterization, rather than allowing an arbitrary free pixel\-wise update\. Equivalently,

rank⁡\(Gθ\)=rank⁡\(Jθ​Jθ⊤\)≤min⁡\(\|Ω\|,P\),\\operatorname\{rank\}\(G\_\{\\theta\}\)=\\operatorname\{rank\}\(J\_\{\\theta\}J\_\{\\theta\}^\{\\top\}\)\\leq\\min\(\|\\Omega\|,P\),\(35\)so the induced density update is constrained to a structured low\-dimensional subspace of the full pixel space\.

For a coordinate MLP with annealed Fourier features, the functionsψp​\(𝐫\)\\psi\_\{p\}\(\\mathbf\{r\}\)are smooth in the early stages because high\-frequency Fourier bands are suppressed in the encodingγβ\\gamma\_\{\\beta\}\. If

wk​\(β\)≈0,k\>kβ,w\_\{k\}\(\\beta\)\\approx 0,\\qquad k\>k\_\{\\beta\},\(36\)then the effective bandwidth of the encoding is approximately

Bβ∼2kβ​π\.B\_\{\\beta\}\\sim 2^\{k\_\{\\beta\}\}\\pi\.\(37\)Consequently, the Jacobian columns are dominated by modes belowBβB\_\{\\beta\}, and

ψp^​\(𝝃\)≈0,\|𝝃\|\>Bβ\.\\widehat\{\\psi\_\{p\}\}\(\\boldsymbol\{\\xi\}\)\\approx 0,\\qquad\|\\boldsymbol\{\\xi\}\|\>B\_\{\\beta\}\.\(38\)It follows that

Gθ​g^​\(𝝃\)=∑p=1P⟨ψp,g⟩​ψp^​\(𝝃\)≈0,\|𝝃\|\>Bβ\.\\widehat\{G\_\{\\theta\}g\}\(\\boldsymbol\{\\xi\}\)=\\sum\_\{p=1\}^\{P\}\\langle\\psi\_\{p\},g\\rangle\\widehat\{\\psi\_\{p\}\}\(\\boldsymbol\{\\xi\}\)\\approx 0,\\qquad\|\\boldsymbol\{\\xi\}\|\>B\_\{\\beta\}\.\(39\)ThusGθG\_\{\\theta\}initially behaves as a low\-pass smoothing operator, consistent with the exponential frequency suppression in \(P1\)\. As the annealing parameterβ\\betaincreases, higher Fourier bands are gradually activated, the effective bandwidth of the Jacobian columns grows, and the spectrum ofGθG\_\{\\theta\}becomes progressively richer\. In this sense, the coordinate parameterization redistributes a single\-pixel density perturbation into a smooth, spatially coupled update during early training, and only later allows sharper, higher\-frequency corrections\. A sharp center spike in∇ρℒ\\nabla\_\{\\rho\}\\mathcal\{L\}\(induced by \(P2\)\+\(P3\) on a uniform initialization\) is therefore mapped to a spatially distributed image\-space update whose center\-to\-outer ratio is bounded by the corresponding ratio ofGθG\_\{\\theta\}rather than by∇ρℒ\\nabla\_\{\\rho\}\\mathcal\{L\}itself\. We measure this directly in Section[5](https://arxiv.org/html/2605.13988#S5): the realized iter\-0NeTMY update\|Δ​ρ\|\|\\Delta\\rho\|does not exhibit a singular center spike, in contrast to the iter\-0free\-density gradient under the same forward operator and same uniform initialization\.

Comparison to free\-density and quasi\-Newton solvers\.For Tikhonov and ADMM,ρ\\rhois the optimization variable and the parameter\-space update is a \(sub\-\)gradient step, so the effective parameterization Jacobian is the identity \(J=IJ=I,Gθ=IG\_\{\\theta\}=I\) and Eq\. \([31](https://arxiv.org/html/2605.13988#A4.E31)\) reduces toΔ​ρ=−η​∇ρℒ\\Delta\\rho=\-\\eta\\,\\nabla\_\{\\rho\}\\mathcal\{L\}: the raw density gradient is executed verbatim, including the \(P2\) center bias and the \(P3\) self\-reinforcing peak coupling\. L\-BFGS also optimizesρ\\rhodirectly, but replaces the gradient direction with−Ht−1​∇ρℒ\-H\_\{t\}^\{\-1\}\\nabla\_\{\\rho\}\\mathcal\{L\}via a low\-rank approximate\-inverse\-HessianHt−1H\_\{t\}^\{\-1\}assembled from past gradient differences; this is a parameter\-space preconditioner orthogonal to the parameterization\-Jacobian filter analyzed here, and provides an independent route to escape the centered\-collapse basin \(consistent with the L\-BFGS center\-mass ratio of0\.0810\.081in Section[5](https://arxiv.org/html/2605.13988#S5)\)\. For GaussianSplat \(Gaussian splats\),GθG\_\{\\theta\}is rank\-bounded by the number of primitives times the dimension of the per\-primitive parameter space, which is small, soGθG\_\{\\theta\}is a strongly localized smoother that cannot represent dense distributions but does limit isolated single\-pixel updates on sparse scenes\. The empirical ranking documented in Section[5](https://arxiv.org/html/2605.13988#S5)is consistent with this filtering view: among solvers whose update geometry is dominated by Eq\. \([31](https://arxiv.org/html/2605.13988#A4.E31)\), methods with richer, smootherGθG\_\{\\theta\}are less attracted to the \(P2\)\+\(P3\) center collapse, with NeTMY \(smooth, fully connected MLP across the grid\) outperforming GaussianSplat \(localized splats\) which in turn outperforms Tikhonov / ADMM \(no parameterization\)\. We do not claim that this filtering view is the only mechanism at play, only that it is sufficient to explain the observed iter\-0image\-space update structure and the ranking direction within the gradient\-step solver family\.

### D\.7Algorithm Summary

Algorithm 1NeTMY per\-measurement optimization \(operatorℱ∈\{ℱ1,ℱ2\}\\mathcal\{F\}\\in\\\{\\mathcal\{F\}\_\{1\},\\mathcal\{F\}\_\{2\}\\\}\)1:Input:measured spectrum

SobsS\_\{\\mathrm\{obs\}\}, inversion operator

ℱ∈\{ℱ1,ℱ2\}\\mathcal\{F\}\\in\\\{\\mathcal\{F\}\_\{1\},\\mathcal\{F\}\_\{2\}\\\}, total epochs

TtotT\_\{\\mathrm\{tot\}\}, base lr

η0\\eta\_\{0\}
2:Initialize MLP parameters

θ∼\\theta\\simdefault init

3:forstage

s∈\{1,2\}s\\in\\\{1,2\\\}with resolution

Rs∈\{32,64\}R\_\{s\}\\in\\\{32,64\\\}and epochs

TsT\_\{s\}, lr

ηs\\eta\_\{s\}do

4:Build coordinate grid

\{𝐱i​j\}\\\{\\mathbf\{x\}\_\{ij\}\\\}at resolution

RsR\_\{s\}
5:Precompute FFT kernels for

ℱ\\mathcal\{F\}at

RsR\_\{s\}
6:for

t=1,…,Tst=1,\\dots,T\_\{s\}do

7:

β←K⋅t/Ts\\beta\\leftarrow K\\cdot t/T\_\{s\}
8:

\(ρθ,ωL,θ\)←fθ​\(𝐱i​j;β\)\(\\rho\_\{\\theta\},\\omega\_\{L,\\theta\}\)\\leftarrow f\_\{\\theta\}\(\\mathbf\{x\}\_\{ij\};\\beta\)⊳\\trianglerightEq\. \([5](https://arxiv.org/html/2605.13988#S4.E5)\), Eq\. \([26](https://arxiv.org/html/2605.13988#A4.E26)\)

9:

S^←ℱ​\(ρθ,ωL,θ\)\\widehat\{S\}\\leftarrow\\mathcal\{F\}\(\\rho\_\{\\theta\},\\omega\_\{L,\\theta\}\)⊳\\trianglerightEq\. \([2](https://arxiv.org/html/2605.13988#S3.E2)\)

10:

ℒ←\\mathcal\{L\}\\leftarrowEq\. \([6](https://arxiv.org/html/2605.13988#S4.E6)\) with weights from Table[7](https://arxiv.org/html/2605.13988#A4.T7)

11:

θ←\\theta\\leftarrowAdamW step with cosine\-annealed lr, gradient clip

12:endfor

13:endfor

14:

Epred←∑ω,𝐫ℱ​\(ρθ,ωL,θ\);Eobs←∑ω,𝐫SobsE\_\{\\mathrm\{pred\}\}\\leftarrow\\sum\_\{\\omega,\\mathbf\{r\}\}\\mathcal\{F\}\(\\rho\_\{\\theta\},\\omega\_\{L,\\theta\}\);\\;E\_\{\\mathrm\{obs\}\}\\leftarrow\\sum\_\{\\omega,\\mathbf\{r\}\}S\_\{\\mathrm\{obs\}\}
15:

αℱ←Eobs/Epred\\alpha\_\{\\mathcal\{F\}\}\\leftarrow E\_\{\\mathrm\{obs\}\}/E\_\{\\mathrm\{pred\}\}if

ℱ=ℱ2\\mathcal\{F\}=\\mathcal\{F\}\_\{2\}, else

Eobs/Epred\\sqrt\{E\_\{\\mathrm\{obs\}\}/E\_\{\\mathrm\{pred\}\}\}
16:

ρ^←αℱ​ρθ\\widehat\{\\rho\}\\leftarrow\\alpha\_\{\\mathcal\{F\}\}\\,\\rho\_\{\\theta\}⊳\\trianglerightEq\. \([30](https://arxiv.org/html/2605.13988#A4.E30)\)

17:return

\(ρ^,ωL,θ\)\(\\widehat\{\\rho\},\\omega\_\{L,\\theta\}\)

## Appendix EExperimental Details

This appendix expands Section[5](https://arxiv.org/html/2605.13988#S5)along five axes: dataset construction \(Appendix[E\.1](https://arxiv.org/html/2605.13988#A5.SS1)\), baseline implementation \(Appendix[E\.2](https://arxiv.org/html/2605.13988#A5.SS2)\), metric definitions \(Appendix[E\.3](https://arxiv.org/html/2605.13988#A5.SS3)\), the fullℱ2/ℱ2\\mathcal\{F\}\_\{2\}/\\mathcal\{F\}\_\{2\}matched\-operator results and per\-class breakdowns \(Appendix[E\.4](https://arxiv.org/html/2605.13988#A5.SS4)\), the mechanism aggregate \(Appendix[E\.5](https://arxiv.org/html/2605.13988#A5.SS5)\), the full ablation sweep including hyperparameter sweeps \(Appendix[E\.6](https://arxiv.org/html/2605.13988#A5.SS6)\), the supervised\-baseline distribution\-shift study \(Appendix[E\.7](https://arxiv.org/html/2605.13988#A5.SS7)\), the four\-step real\-data protocol \(Appendix[E\.8](https://arxiv.org/html/2605.13988#A5.SS8)\), runtime and failure modes \(Appendix[E\.9](https://arxiv.org/html/2605.13988#A5.SS9), Appendix[E\.10](https://arxiv.org/html/2605.13988#A5.SS10)\)\. Implementation hyperparameters that are NeTMY\-specific \(architecture, optimizer, multiscale schedule, loss weights, scale correction\) are in Appendix[D](https://arxiv.org/html/2605.13988#A4)\.

### E\.1Datasets

Cross\-fidelity benchmark \(Section[5\.2](https://arxiv.org/html/2605.13988#S5.SS2), 512 samples\)\.The cross\-fidelity benchmark uses spectra generated by the source\-side direct simulatorℱ3\\mathcal\{F\}\_\{3\}\(Appendix[A\.4](https://arxiv.org/html/2605.13988#A1.SS4)\) on a64×6464\\times 64grid at standoffz0=20z\_\{0\}=20nm and grid spacing2020nm, with sensor noiseσε2\\sigma\_\{\\varepsilon\}^\{2\}matched to per\-sample dynamic range\. Source counts and minimum separations are drawn from eight scene classes that cover the regimes most relevant to NV relaxometry:few/close,few/medium,few/far,medium/close,medium/medium,medium/far,many/close,many/far\. The Larmor fieldωL∈\[1\.5,2\.5\]\\omega\_\{L\}\\in\[1\.5,2\.5\]GHz is evaluated on a5050\-frequency grid and is constant within each ground\-truth source\. Sample IDs follow the pattern⟨seq⟩​\_​⟨class⟩\\langle\\text\{seq\}\\rangle\\\_\\langle\\text\{class\}\\rangleused internally for cross\-referencing\. Each method is run on the same512512samples under bothℱ1\\mathcal\{F\}\_\{1\}andℱ2\\mathcal\{F\}\_\{2\}for inversion \(so the table reports4×2=84\\times 2=8method\-operator conditions\)\.

Matched\-operator benchmark \(Section[5\.2](https://arxiv.org/html/2605.13988#S5.SS2), 512 samples\)\.The matched\-operator benchmark uses spectra generated and inverted byℱ2\\mathcal\{F\}\_\{2\}, on the same physical setup as the cross\-fidelity benchmark\. The samples cover the same scene classes and are used as a broader\-coverage view that includes amortized and untrained\-prior baselines \(DeepDecoder, HybridNeTMY\) for which theℱ3\\mathcal\{F\}\_\{3\}form is not implemented\. The matched\-operator setup is intentionally an inverse\-crime configuration\[[36](https://arxiv.org/html/2605.13988#bib.bib3)\]; we report it only as a complementary view, with the cross\-fidelity benchmark as the primary evidence\.

Real\-data dataset \(Section[5\.5](https://arxiv.org/html/2605.13988#S5.SS5)\)\.Theα\\alpha\-RuCl3dataset is the publicly released NV\-relaxometry data fromKumaret al\.\[[41](https://arxiv.org/html/2605.13988#bib.bib27)\], comprisingT1T\_\{1\}\-with\-flake andT1T\_\{1\}\-without\-flake measurements at88NVs \(NV2, NV3, NV4, NV6, NV7, NV8, NV9, NV10\) at the same magnetic\-field setting, plus a magnetic\-field\-dependent dataset on a separate NV \(used only for spectral calibration\)\. The crystallographic spin density per layer isρ0=6\.45×1018\\rho\_\{0\}=6\.45\\times 10^\{18\}m\-2and the effective magnetic moment isμeff=2\.31​μB\\mu\_\{\\mathrm\{eff\}\}=2\.31\\,\\mu\_\{B\}\[[41](https://arxiv.org/html/2605.13988#bib.bib27)\]\. The SRIM\-implied NV\-depth prior is14±514\\pm 5nm, derived from the reported ion implantation energy of9\.89\.8keV and SRIM ion\-range/straggle simulation\.

### E\.2Baseline Implementation

All baselines minimize Eq\. \([3](https://arxiv.org/html/2605.13988#S3.E3)\) on the same coordinate grid as NeTMY, using the same forward operator and the same observed spectrum, but differ in howρ\\rhois parameterized and which auxiliary regularizers are added\. We give compact algorithmic settings for each\.

Tikhonov\.Free\-densityρ∈ℝ≥0\|Ω\|\\rho\\in\\mathbb\{R\}\_\{\\geq 0\}^\{\|\\Omega\|\}, optimized by Adam\[[39](https://arxiv.org/html/2605.13988#bib.bib83)\]with learning rate5×10−35\\\!\\times\\\!10^\{\-3\}, weight decay10−510^\{\-5\}, and gradient clipping at11for50005000epochs\. Regularizers areℓ2\\ell\_\{2\}\(weight10−310^\{\-3\}\) and TV \(weight10−310^\{\-3\}\), matching the NeTMY weights up to the absent NeTMY\-specific physics losses\. The scale\-correction post\-processing of Eq\. \([30](https://arxiv.org/html/2605.13988#A4.E30)\) is applied to all methods identically for paired comparison\.

ADMM\.Variable\-splitting onρ=z\\rho=zwith augmented\-Lagrangian penaltyμ=10−3\\mu=10^\{\-3\}, box constraints\[0,ρmax\]\[0,\\rho\_\{\\mathrm\{max\}\}\]via projection, and proximalℓ1\\ell\_\{1\}shrinkage with weight10−210^\{\-2\}\. The data\-fidelity step uses Adam at learning rate5×10−35\\\!\\times\\\!10^\{\-3\}for3030inner iterations per outer cycle, run for200200outer cycles\. Stopping criteria are‖ρ−z‖∞<10−3\\left\\lVert\\rho\-z\\right\\rVert\_\{\\infty\}<10^\{\-3\}or maximum iterations\.

GaussianSplat\.Explicit set ofK=64K=64Gaussian primitives with parameters\(𝝁k,𝝈k,ak\)∈ℝ2×ℝ\>02×ℝ≥0\(\\boldsymbol\{\\mu\}\_\{k\},\\boldsymbol\{\\sigma\}\_\{k\},a\_\{k\}\)\\in\\mathbb\{R\}^\{2\}\\times\\mathbb\{R\}\_\{\>0\}^\{2\}\\times\\mathbb\{R\}\_\{\\geq 0\}, where𝝈k\\boldsymbol\{\\sigma\}\_\{k\}is the per\-primitive width vector \(distinct from the sigmoidσ​\(⋅\)\\sigma\(\\cdot\)and from the Gaussian\-ansatz widthσg\\sigma\_\{g\}used in Section[5\.5](https://arxiv.org/html/2605.13988#S5.SS5)\)\. Prune/split/clone/merge schedule from\[[38](https://arxiv.org/html/2605.13988#bib.bib64)\]adapted to the dipolar forward operator\. Optimizer Adam at learning rate10−310^\{\-3\}forT=400T=400iterations\. The number of primitives is dynamic but capped at128128\.

DeepDecoder\.Untrained convolutional decoder prior ofHeckel and Hand \[[28](https://arxiv.org/html/2605.13988#bib.bib18)\]with55upsampling stages, channel widths\[128,128,128,128,1\]\[128,128,128,128,1\], and bilinear upsampling\. Optimized for50005000steps with Adam at learning rate10−310^\{\-3\}\. The pixel output is multiplied by11to remain nonnegative; values<0<0are clamped at zero\.

U\-Net \(and U\-Net\+physics\)\.Four\-level U\-Net with channel widths\[32,64,128,256\]\[32,64,128,256\], instance normalization, and a final pixelwise softplus, trained on300300dense paired samples for200200epochs with Adam at learning rate10−310^\{\-3\}\. The U\-Net\+physics variant additionally optimizes a physics\-residual loss‖ℱ2​\(ρ^,ωL\)−Sobs‖22\\left\\lVert\\mathcal\{F\}\_\{2\}\(\\widehat\{\\rho\},\\omega\_\{L\}\)\-S\_\{\\mathrm\{obs\}\}\\right\\rVert\_\{2\}^\{2\}alongside the supervised pixel loss\. We report only the supervised\-trained model and evaluate on sparse test scenes \(which are out\-of\-distribution by construction\); see Appendix[E\.7](https://arxiv.org/html/2605.13988#A5.SS7)\.

GAN\-SL\+physics\.Generative adversarial network with the same backbone as the U\-Net for the generator, a PatchGAN discriminator, and a physics\-loss term applied to the generator’s output\. Training and evaluation protocols mirror the U\-Net\.

HybridNeTMY\.A coordinate\-image hybrid in which the input to a U\-Net is concatenated with a Fourier\-feature encoding of the coordinate grid, mirroring the coordinate\-feature input of NeTMY but retaining a CNN architecture and supervised training\. Reported as a baseline only because the user community has asked whether the coordinate\-feature input alone suffices to produce NeTMY\-like behavior under supervised training; the answer \(Appendix[E\.7](https://arxiv.org/html/2605.13988#A5.SS7)\) is that it does not\.

L\-BFGS\.Limited\-memory BFGS\[[45](https://arxiv.org/html/2605.13988#bib.bib82)\]on free\-densityρ\\rhowith history depth2020, line\-search Wolfe conditions, and100100outer iterations\. Used only in the mechanism analysis \(Section[5\.3](https://arxiv.org/html/2605.13988#S5.SS3), Appendix[E\.5](https://arxiv.org/html/2605.13988#A5.SS5)\) as a reference quasi\-Newton free\-density solver\.

### E\.3Metric Definitions

Gradient Magnitude Similarity Deviation \(GMSD\)\.GMSD\[[80](https://arxiv.org/html/2605.13988#bib.bib79)\]measures structural fidelity through the standard deviation of a pointwise gradient\-magnitude similarity map\. Given the predicted densityρ^\\widehat\{\\rho\}and ground\-truth densityρ⋆\\rho\_\{\\star\}, both rescaled to share the maximum value, define gradient magnitudesGX=\(Dx∗X\)2\+\(Dy∗X\)2G\_\{X\}=\\sqrt\{\(D\_\{x\}\*X\)^\{2\}\+\(D\_\{y\}\*X\)^\{2\}\}forX∈\{ρ^,ρ⋆\}X\\in\\\{\\widehat\{\\rho\},\\rho\_\{\\star\}\\\}with the Prewitt operatorsDx,DyD\_\{x\},D\_\{y\}\. The pointwise GMS isGMS​\(𝐫\)=\(2​Gρ^​Gρ⋆\+c\)/\(Gρ^2\+Gρ⋆2\+c\)\\mathrm\{GMS\}\(\\mathbf\{r\}\)=\(2G\_\{\\widehat\{\\rho\}\}G\_\{\\rho\_\{\\star\}\}\+c\)/\(G\_\{\\widehat\{\\rho\}\}^\{2\}\+G\_\{\\rho\_\{\\star\}\}^\{2\}\+c\)with stability constantcc, andGMSD=Var𝐫​GMS​\(𝐫\)\\mathrm\{GMSD\}=\\sqrt\{\\mathrm\{Var\}\_\{\\mathbf\{r\}\}\\,\\mathrm\{GMS\}\(\\mathbf\{r\}\)\}\. Lower is better\.

Hungarian F1\.A localized peak\-matching score: peaks are extracted fromρ^\\widehat\{\\rho\}andρ⋆\\rho\_\{\\star\}by local\-maximum filtering above an absolute threshold of5%5\\%of the per\-image maximum, paired by the Hungarian algorithm\[[40](https://arxiv.org/html/2605.13988#bib.bib80)\]under a fixed matching radius ofrmatch=2r\_\{\\mathrm\{match\}\}=2pixels \(corresponding to one half of the Section[3\.3](https://arxiv.org/html/2605.13988#S3.SS3)effective PSF widthwpsfw\_\{\\mathrm\{psf\}\}at the standoff and characteristic mode used in the cross\-fidelity benchmark\), and scored asF1=2​TP/\(2​TP\+FP\+FN\)F\_\{1\}=2\\,\\mathrm\{TP\}/\(2\\,\\mathrm\{TP\}\+\\mathrm\{FP\}\+\\mathrm\{FN\}\)\. Higher is better\. F1is well\-defined only when both reconstruction and ground truth contain at least one supra\-threshold peak; we reportF1=0F\_\{1\}=0for the degenerate case\.

Sliced Wasserstein Distance \(SWD\)\.The sliced Wasserstein\-2 distance\[[7](https://arxiv.org/html/2605.13988#bib.bib81)\], evaluated as the mean over128128random projections drawn uniformly from the unit sphere of the11\-D Wasserstein\-2 distance between the two projected mass distributionsPθ​\#​ρ^/‖ρ^‖1P\_\{\\theta\\\#\}\\widehat\{\\rho\}/\\left\\lVert\\widehat\{\\rho\}\\right\\rVert\_\{1\}andPθ​\#​ρ⋆/‖ρ⋆‖1P\_\{\\theta\\\#\}\\rho\_\{\\star\}/\\left\\lVert\\rho\_\{\\star\}\\right\\rVert\_\{1\}\. Lower is better\.

Density MSE\.The pixelwise mean square errorMSE=\|Ω\|−1​∑𝐫∈Ω\(ρ^​\(𝐫\)−ρ⋆​\(𝐫\)\)2\\mathrm\{MSE\}=\|\\Omega\|^\{\-1\}\\sum\_\{\\mathbf\{r\}\\in\\Omega\}\(\\widehat\{\\rho\}\(\\mathbf\{r\}\)\-\\rho\_\{\\star\}\(\\mathbf\{r\}\)\)^\{2\}on the post\-scale\-corrected reconstruction\. Reported as a secondary metric only because the background\-dominated character of sparse density fields makes MSE insensitive to localization quality\.

Masked SSIM\.The structural similarity index\[[79](https://arxiv.org/html/2605.13988#bib.bib84)\]evaluated on the support ofρ⋆\\rho\_\{\\star\}\(i\.e\., on pixels whereρ⋆\>0\\rho\_\{\\star\}\>0\), reported in Section[5\.4](https://arxiv.org/html/2605.13988#S5.SS4)alongside MSE\.

### E\.4Matched\-Operator Density\-MSE Leaderboard

Table[8](https://arxiv.org/html/2605.13988#A5.T8)reports the per\-method density\-MSE leaderboard on the512512\-sampleℱ2/ℱ2\\mathcal\{F\}\_\{2\}/\\mathcal\{F\}\_\{2\}matched\-operator benchmark, with secondary metrics\.

Table 8:Density\-MSE leaderboard on the512512\-sampleℱ2/ℱ2\\mathcal\{F\}\_\{2\}/\\mathcal\{F\}\_\{2\}matched\-operator benchmark\. Density MSE and noise\-map MSE are means over512512sparse samples\. Train time is mean wall\-clock on a single A6000 GPU\.MethodDensity MSE↓\\downarrowNoise MSE↓\\downarrowTrain time \(s\)NotesNeTMY \(Ours\)0\.00370\.21780per\-instance, no labelsGaussianSplat0\.00671\.09402explicit GaussiansTikhonov0\.01170\.811\.5free\-density \+ℓ2\\ell\_\{2\}/TVADMM0\.255522\.941\.1free\-density \+ ADMMDeepDecoder0\.255824\.25259untrained CNN priorHybridNeTMY1\.0110\.23399supervised, dense\-trainedThe matched\-operator ranking is consistent with the cross\-fidelity benchmark of Table[1](https://arxiv.org/html/2605.13988#S5.T1)on density\-MSE, but, as discussed in Section[5\.2](https://arxiv.org/html/2605.13988#S5.SS2), density\-MSE alone is insensitive to centered\-collapse failure modes that the cross\-fidelity benchmark exposes through Hungarian F1and SWD\. The supervised HybridNeTMY’s poor MSE reflects the dense\-to\-sparse distribution\-shift failure documented in Appendix[E\.7](https://arxiv.org/html/2605.13988#A5.SS7)\.

### E\.5Mechanism Aggregate and Per\-Method Diagnostics

32\-run ADMM aggregate sweep\.To isolate the operator effect from solver\-specific choices on the centered\-collapse phenomenon, we run3232ADMM configurations \(4 samples×\\times8 loss\-baseline settings\) under bothℱ1\\mathcal\{F\}\_\{1\}andℱ2\\mathcal\{F\}\_\{2\}inversion and report the center\-mass ratio at termination\. Mean center\-mass ratio rises from0\.553±0\.3440\.553\\pm 0\.344underℱ1\\mathcal\{F\}\_\{1\}to0\.776±0\.3690\.776\\pm 0\.369underℱ2\\mathcal\{F\}\_\{2\}\(Table[9](https://arxiv.org/html/2605.13988#A5.T9)\), confirming that the more faithful operator drives a stronger center collapse for free\-density solvers\. The 32\-run sweep is ADMM\-only by construction; the cross\-method evidence is the per\-sample comparison of Figure[4](https://arxiv.org/html/2605.13988#S5.F4)\.

Table 9:ADMM\-only center\-collapse aggregate over3232runs \(44samples×\\times88loss\-baseline configurations\)\. Higher center\-mass ratio means more centered collapse\.OperatorCenter\-mass ratio \(mean±\\pmstd\)Sample countℱ1\\mathcal\{F\}\_\{1\}\(scalar\)0\.553±0\.3440\.553\\pm 0\.34432ℱ2\\mathcal\{F\}\_\{2\}\(tensor\)0\.776±0\.3690\.776\\pm 0\.36932NeTMY first\-step image\-space update\.Lemma[2](https://arxiv.org/html/2605.13988#Thmlemma2)predicts that the realized iter\-0NeTMY update\|Δ​ρ\|\|\\Delta\\rho\|should be smoothed and structurally coupled across pixels rather than executed verbatim as a singular center spike\. We measure this directly on the same many\-source sample \(sample024024\): the iter\-0image\-space update\|Δ​ρ\|\|\\Delta\\rho\|is spatially distributed with a center\-to\-outer ratio of approximately1\.6×1\.6\\times\(vs18\.29×18\.29\\timesfor the raw density\-space gradient\)\. This∼11×\\sim\\\!11\\timesdamping is the empirical signature ofGθ=Jθ​Jθ⊤G\_\{\\theta\}=J\_\{\\theta\}J\_\{\\theta\}^\{\\top\}\.

### E\.6Component Ablation: Per\-Class and Hyperparameter Sweeps

The cumulative ablation of Table[3](https://arxiv.org/html/2605.13988#S5.T3)aggregates over512512samples\. The largest per\-class failure under any ablation is casemedium/medium, where−ℓ1\-\\ell\_\{1\}raises density MSE by×∼2\.7\\times\\\!\\sim\\\!2\.7and degrades SSIM from0\.960\.96to0\.820\.82\. The smallest is casefew/close, where most ablations leave density MSE essentially unchanged \(because a single source is reconstructed correctly by every variant\)\. These sample\-level patterns are consistent with the difficulty class\.

Hyperparameter sweeps\.Table[10](https://arxiv.org/html/2605.13988#A5.T10)reports three single\-axis sweeps \(learning rate, hidden width, PE octave count\) on held\-out samples\. NeTMY is robust to learning rate and hidden width within roughly an order of magnitude of the defaults; PE octave countK=12K\\\!=\\\!12is the smallest setting that recovers high\-frequency sparse detail without divergent peaks\.

Table 10:Hyperparameter sweeps over a6464\-sample held\-out set\. Best per axis in bold\.AxisSettingLearning rateη0\\eta\_\{0\}10−410^\{\-4\}5×10−45\\\!\\times\\\!10^\{\-4\}𝟏𝟎−𝟑\\mathbf\{10^\{\-3\}\}5×10−35\\\!\\times\\\!10^\{\-3\}10−210^\{\-2\}density MSE0\.00490\.00400\.00370\.00380\.0067Hidden width128128256256𝟑𝟐𝟎\\mathbf\{320\}384384512512density MSE0\.00460\.00400\.00370\.00380\.0040PE octavesKK4488𝟏𝟐\\mathbf\{12\}16162020density MSE0\.00830\.00480\.00370\.00400\.0046
### E\.7Supervised Baselines and Distribution Shift

Setup\.We train four supervised baselines \(GAN\-SL\+physics, U\-Net\-SL, U\-Net\-SL\+physics, HybridNeTMY\) on dense scenes \(300300paired samples drawn from a dense\-source generator with3030–6060sources per scene\) and evaluate on the sparse test scenes used by NeTMY\. The dense→\\tosparse mismatch is severe by design: training scenes have nearly continuous mass distributions while test scenes have11–88off\-center delta\-like sources\.

Failure observation\.Across the four supervised baselines, the dense\-to\-sparse evaluation produces three characteristic failure patterns: \(i\)GAN\-SL\+physicsfails to recover the full sparse support and instead concentrates its prediction on only a few dominant responses, suppressing weaker peaks and producing an incomplete sparse reconstruction\. \(ii\)U\-Net\-SL\+physicsshows a smoother failure mode, with low\-amplitude diffuse predictions and residual background activation, indicating a tendency toward blurred interpolation rather than faithful sparse peak recovery\. \(iii\)U\-Net\-SLis the most unstable of the supervised baselines in this experiment, often generating exaggerated peak amplitudes, spurious activations, and broad background artifacts under dense\-to\-sparse transfer\. TheHybridNeTMYmodel, despite its coordinate\-feature input, behaves like a supervised CNN under the dense\-trained regime: its density MSE on the matched\-operatorℱ2/ℱ2\\mathcal\{F\}\_\{2\}/\\mathcal\{F\}\_\{2\}benchmark is1\.0111\.011\(Table[8](https://arxiv.org/html/2605.13988#A5.T8)\), the worst of the seven methods\. NeTMY is unaffected because it is optimized per measurement without paired density labels\.

What the failure rules out\.The supervised distribution\-shift failure rules out the hypothesis that the NeTMY advantage on sparse scenes is reducible to a coordinate\-feature inductive bias: HybridNeTMY shares the coordinate\-feature input but, when trained supervised on dense scenes, behaves like the other supervised baselines\. The advantage of NeTMY is in the per\-measurement optimization\-against\-physics protocol of Section[4](https://arxiv.org/html/2605.13988#S4), not in the feature space alone\.

### E\.8Real\-Data Cross\-Check: Full Four\-Step Protocol

The Section[5\.5](https://arxiv.org/html/2605.13988#S5.SS5)cross\-check runs four steps\. We expand each below\.

Step 1: SRIM\-anchored amplitude calibration\.For each NVi∈\{2,3,4,6,7,8,9,10\}i\\in\\\{2,3,4,6,7,8,9,10\\\}, we compute the implied NV depthdi⋆d\_\{i\}^\{\\star\}under each operator by invertingΓRupred​\(di⋆\)=ΓRui,meas\\Gamma\_\{\\mathrm\{Ru\}\}^\{\\mathrm\{pred\}\}\(d\_\{i\}^\{\\star\}\)=\\Gamma\_\{\\mathrm\{Ru\}\}^\{i,\\mathrm\{meas\}\}\. Underℱ2\\mathcal\{F\}\_\{2\},11NV \(NV4\) lands in\[9,20\]\[9,20\]nm atd4⋆=14\.49d^\{\\star\}\_\{4\}=14\.49nm; underℱ1\\mathcal\{F\}\_\{1\}, no NVs land in the window becauseΓRuℱ1​\(d\)\>ΓRumeas\\Gamma\_\{\\mathrm\{Ru\}\}^\{\\mathcal\{F\}\_\{1\}\}\(d\)\>\\Gamma\_\{\\mathrm\{Ru\}\}^\{\\mathrm\{meas\}\}for alld\>9d\>9nm at the SRIM\-prior amplitude\. The point amplitude ratio atd=14\.5d=14\.5nm isΓℱ1/Γℱ2=11,353\.5/54\.0≈210×\\Gamma^\{\\mathcal\{F\}\_\{1\}\}/\\Gamma^\{\\mathcal\{F\}\_\{2\}\}=11\{,\}353\.5/54\.0\\approx 210\\times\.

Step 2: Power\-law analysis\.The independent quantityΓint∝d−α\\Gamma\_\{\\mathrm\{int\}\}\\propto d^\{\-\\alpha\}withα=1\.06±0\.22\\alpha=1\.06\\pm 0\.22provided byKumaret al\.\[[41](https://arxiv.org/html/2605.13988#bib.bib27)\]lets us eliminateddand convert each operator’s depth scaling into an internal\-rate scaling:ℱ2\\mathcal\{F\}\_\{2\}predictsbpred=4/α=3\.77b\_\{\\mathrm\{pred\}\}=4/\\alpha=3\.77,ℱ1\\mathcal\{F\}\_\{1\}predictsbpred=2/α=1\.89b\_\{\\mathrm\{pred\}\}=2/\\alpha=1\.89\. The measured slope isbexp=1\.43±0\.21b\_\{\\mathrm\{exp\}\}=1\.43\\pm 0\.21\. Slope alone is closer to theℱ1\\mathcal\{F\}\_\{1\}prediction; we therefore explicitly report this axis as ambiguous, since slope and intercept are independent\.

Step 3: Spectral calibration\.Magnetic\-field\-dependentT1T\_\{1\}measurements atB∈\{2,313,383\.5\}B\\in\\\{2,313,383\.5\\\}Gauss \(single NV\) calibrate the spin\-bath correlation timeτc=0\.41\\tau\_\{c\}=0\.41ns and bath linewidthγbath=2\.45\\gamma\_\{\\mathrm\{bath\}\}=2\.45GHz\. Both operators share the Lorentzian spectral response, so this calibration is operator\-neutral; we report it for completeness\.

Step 4: Loss\-landscape identifiability\.On the Gaussian density ansatzρ​\(𝐫;A,σg\)=A​exp⁡\(−‖𝐫‖2/2​σg2\)\\rho\(\\mathbf\{r\};A,\\sigma\_\{g\}\)=A\\exp\(\-\\left\\lVert\\mathbf\{r\}\\right\\rVert^\{2\}/2\\sigma\_\{g\}^\{2\}\), whereσg\\sigma\_\{g\}denotes the Gaussian width, we evaluate the log\-MSE lossL​\(A,σg\)=\|NNV\|−1​∑i\(log⁡ΓRupred​\(A,σg,di\)−log⁡ΓRui,meas\)2L\(A,\\sigma\_\{g\}\)=\|N\_\{\\mathrm\{NV\}\}\|^\{\-1\}\\sum\_\{i\}\(\\log\\Gamma\_\{\\mathrm\{Ru\}\}^\{\\mathrm\{pred\}\}\(A,\\sigma\_\{g\},d\_\{i\}\)\-\\log\\Gamma\_\{\\mathrm\{Ru\}\}^\{i,\\mathrm\{meas\}\}\)^\{2\}on a90×9090\\times 90grid in\(log10⁡A,σg\)\(\\log\_\{10\}A,\\sigma\_\{g\}\)\. The Hessian condition number at the minimum isκℱ2=931\\kappa\_\{\\mathcal\{F\}\_\{2\}\}=931andκℱ1=301,139\\kappa\_\{\\mathcal\{F\}\_\{1\}\}=301\{,\}139\(ratio∼323×\\sim\\\!323\\times\)\. Theℱ2\\mathcal\{F\}\_\{2\}landscape is a parabolic bowl; theℱ1\\mathcal\{F\}\_\{1\}landscape is a degenerate valley alongA2​σg2=constA^\{2\}\\sigma\_\{g\}^\{2\}=\\mathrm\{const\}, which is the algebraic consequence ofΓ∝A2\\Gamma\\propto A^\{2\}inℱ1\\mathcal\{F\}\_\{1\}and is non\-identifiable on this dataset\.

Read\-back\.Two of three axes \(depth\-amplitude, conditioning\) disfavorℱ1\\mathcal\{F\}\_\{1\}; the third \(slope\) is ambiguous\. We frame the result as a multi\-axis consistency check, not as a validation, because the dataset is88NVs with a11\-D depth axis and a Gaussian density ansatz, not a spatially resolved reconstruction benchmark\.

### E\.9Runtime

NeTMY’s per\-instance wall\-clock on the512512\-sample matched\-operator benchmark on a single NVIDIA A6000 GPU is long\-tailed: median273273s, trimmed mean \(drop top55samples\)426426s, untrimmed mean780780s\. The top\-55samples are densemany/closeconfigurations where the64×6464\\times 64stage requires more iterations to converge on the full source set\. Free\-density baselines \(Tikhonov1\.551\.55s, ADMM1\.101\.10s\) are roughly200200–700×700\\timesfaster on the same hardware; GaussianSplat \(402402s\) and DeepDecoder \(259259s\) are within an order of magnitude of NeTMY\. In the regime that motivated this work \(per\-instance reconstruction without paired density labels\), the relevant comparison is to the supervised amortized baselines, which are fast at inference but require a labeled training set that is generally not available for NV\-relaxometry samples; in regimes where labels are abundant, amortized methods are the appropriate choice\.

### E\.10Failure Modes

We document three known failure modes that the experimental analysis surfaces\.

Cross\-shaped artifacts on dense scenes\.As shown in Figure[8](https://arxiv.org/html/2605.13988#A5.F8), on samples in themany/mediumclass with∼30\\sim\\\!30–4040sources spaced at the resolution limit, NeTMY occasionally produces axis\-aligned cross\-shaped artifacts in the reconstructed density\. The artifact morphology is consistent with the \(P4\) anisotropic\-explanation prediction of Section[3\.3](https://arxiv.org/html/2605.13988#S3.SS3): when many sources lie withinwpsfw\_\{\\mathrm\{psf\}\}of each other, the optimizer converges to a low\-cost density that places mass along low\-cost axis\-aligned tracks rather than at the true source positions\. Anti\-artifact regularizers \(isotropic TV, Laplacian, Fourier\-axis penalties\) reduce visible cross artifacts but slightly worsen aggregate density MSE; we keep the standard anisotropic TV in the main runs\.

High\-frequency leakage on small clusters\.The medium/medium row of Figure[9](https://arxiv.org/html/2605.13988#A5.F9)shows that NeTMY occasionally leaks small\-cluster mass into nearby pixels\. This is the high\-frequency tail of the annealed PE schedule: at fullβ=K\\beta=Kthe network can fit non\-source\-supported pixels, and theℓ1\\ell\_\{1\}/gate combination is the only mechanism preventing this\. On samples with∼8\\sim\\\!8–1515closely spaced sources the leakage is visible; on simpler sparse scenes it is suppressed\.

Centered minimum trapping under random initialization\.For∼5%\\sim\\\!5\\%of the matched\-operator samples, NeTMY’s stage\-1 minimum is in the centered\-collapse basin documented in Section[5\.3](https://arxiv.org/html/2605.13988#S5.SS3)\. Stage\-2 refinement at higher resolution and lower learning rate escapes this basin via theβ\\beta\-schedule reset, but the fraction of samples in which this happens is sensitive to the random initialization of the MLP and to stage\-1 epoch count\. We did not observe any systematic correlation with scene class\.

![Refer to caption](https://arxiv.org/html/2605.13988v1/figures/sec5/Failure_1_comparison.png)Figure 8:The reconstruction captures the broad support of the active region but develops strong cross\-like, axis\-aligned artifacts around high\-intensity locations\. The anti\-artifact variant substantially suppresses these artifacts, albeit at the cost of slightly degraded reconstruction quality\.![Refer to caption](https://arxiv.org/html/2605.13988v1/figures/sec5/Failure_2.png)Figure 9:High\-frequency leakage on a medium/medium sample\. The predicted density spreads small\-cluster mass into neighboring pixels beyond the true source locations, a leakage artifact driven by the high\-frequency tail of the annealed PE schedule\. Theℓ1\\ell\_\{1\}/gate combination partially suppresses this effect, but residual leakage remains visible given the∼\\sim8–15 closely spaced sources in this scene\.![Refer to caption](https://arxiv.org/html/2605.13988v1/figures/sec5/netmy_delta_abs_surface.png)Figure 10:Realized first\-step image\-space update\|Δ​ρ\|\|\\Delta\\rho\|produced by NeTMY, whereΔ​ρ=ρ1−ρ0\\Delta\\rho=\\rho\_\{1\}\-\\rho\_\{0\}\. The surface does*not*exhibit a singular center spike, in contrast to the iter\-0 free\-density gradient under the same forward operator \(Section[5\.3](https://arxiv.org/html/2605.13988#S5.SS3)\)\. This is the empirical signature of the Lemma[2](https://arxiv.org/html/2605.13988#Thmlemma2)filtering kernelGθ=Jθ​Jθ⊤G\_\{\\theta\}=J\_\{\\theta\}J\_\{\\theta\}^\{\\top\}redistributing the raw density\-space gradient through the parameterization\.
### E\.11Qualitative Examples

This subsection presents additional per\-sample reconstruction grids that complement the cross\-fidelity benchmark of Section[5\.2](https://arxiv.org/html/2605.13988#S5.SS2)and the matched\-operator benchmark of Appendix[E\.4](https://arxiv.org/html/2605.13988#A5.SS4)\. Figures[11](https://arxiv.org/html/2605.13988#A5.F11)–[14](https://arxiv.org/html/2605.13988#A5.F14)show four cross\-fidelity samples \(ℱ3\\mathcal\{F\}\_\{3\}\-generated, inverted under bothℱ1\\mathcal\{F\}\_\{1\}andℱ2\\mathcal\{F\}\_\{2\}\) and illustrate the operator\-fidelity reshaping discussed in Section[5\.2](https://arxiv.org/html/2605.13988#S5.SS2)\. Figures[15](https://arxiv.org/html/2605.13988#A5.F15)and[16](https://arxiv.org/html/2605.13988#A5.F16)show four matched\-operator samples each underℱ1/ℱ1\\mathcal\{F\}\_\{1\}/\\mathcal\{F\}\_\{1\}andℱ2/ℱ2\\mathcal\{F\}\_\{2\}/\\mathcal\{F\}\_\{2\}, respectively, drawn from the matched\-operator benchmark of Appendix[E\.4](https://arxiv.org/html/2605.13988#A5.SS4)\. Within each figure, columns correspond to \(left to right\) the ground\-truth density, Tikhonov, ADMM, GaussianSplat, and NeTMY reconstructions; the centered iter\-0collapse of free\-density solvers underℱ2\\mathcal\{F\}\_\{2\}is visible as a concentrated central blob in the Tikhonov / ADMM panels, while NeTMY recovers off\-center sparse structure consistent with the ground truth\.

![Refer to caption](https://arxiv.org/html/2605.13988v1/figures/sec5/F3_Compare_008.png)Figure 11:Cross\-fidelity sample008008\(ℱ3\\mathcal\{F\}\_\{3\}\-generated\)\. Reconstructions underℱ1\\mathcal\{F\}\_\{1\}andℱ2\\mathcal\{F\}\_\{2\}inversion\. Free\-density solvers \(Tikhonov, ADMM\) center\-collapse under the more faithfulℱ2\\mathcal\{F\}\_\{2\}, while NeTMY recovers the off\-center support; GaussianSplat sits between the two regimes\. See Section[5\.2](https://arxiv.org/html/2605.13988#S5.SS2)for aggregate metrics\.![Refer to caption](https://arxiv.org/html/2605.13988v1/figures/sec5/F3_Compare_009.png)Figure 12:Cross\-fidelity sample009009\(ℱ3\\mathcal\{F\}\_\{3\}\-generated\)\. Same layout as Figure[11](https://arxiv.org/html/2605.13988#A5.F11):ℱ1\\mathcal\{F\}\_\{1\}inversion \(top row\) vs\.ℱ2\\mathcal\{F\}\_\{2\}inversion \(bottom row\), columns are Tikhonov, ADMM, GaussianSplat, NeTMY\. Theℱ2\\mathcal\{F\}\_\{2\}→ℱ1\\mathcal\{F\}\_\{1\}ranking flip discussed in Section[5\.2](https://arxiv.org/html/2605.13988#S5.SS2)is visible at the per\-sample level\.![Refer to caption](https://arxiv.org/html/2605.13988v1/figures/sec5/F3_Compare_10.png)Figure 13:Cross\-fidelity sample010010\(ℱ3\\mathcal\{F\}\_\{3\}\-generated\)\. Amany/closescene where the \(P4\) merging pathology and the \(P2\)\+\(P3\) center\-collapse pathology jointly stress the free\-density solvers; NeTMY preserves the most peaks within the matching radius of Appendix[E\.3](https://arxiv.org/html/2605.13988#A5.SS3)\.![Refer to caption](https://arxiv.org/html/2605.13988v1/figures/sec5/F3_Compare_018.png)Figure 14:Cross\-fidelity sample018018\(ℱ3\\mathcal\{F\}\_\{3\}\-generated\)\. Amedium/mediumscene; GaussianSplat is competitive on Hungarian F1here while NeTMY remains best on Sliced\-Wasserstein, consistent with the table\-level rankings of Section[5\.2](https://arxiv.org/html/2605.13988#S5.SS2)\.![Refer to caption](https://arxiv.org/html/2605.13988v1/figures/sec5/Shallow_012_016_022_029.png)Figure 15:Matched\-operatorℱ1/ℱ1\\mathcal\{F\}\_\{1\}/\\mathcal\{F\}\_\{1\}examples \(samples012012,016016,022022,029029\)\. Under the simplified scalar/coherent operator there is no centering pathology, and ADMM is the lowest\-MSE method \(Table[3](https://arxiv.org/html/2605.13988#S5.T3)\); NeTMY remains visually competitive with cleaner peaks but does not dominate\.![Refer to caption](https://arxiv.org/html/2605.13988v1/figures/sec5/Physical_012_016_022_029.png)Figure 16:Matched\-operatorℱ2/ℱ2\\mathcal\{F\}\_\{2\}/\\mathcal\{F\}\_\{2\}examples \(samples012012,016016,022022,029029\)\. Under the tensor power\-summed operator the same ADMM run degrades to a centered collapse, while NeTMY produces the cleanest off\-center reconstructions and tops everyℱ2/ℱ2\\mathcal\{F\}\_\{2\}/\\mathcal\{F\}\_\{2\}metric in Table[3](https://arxiv.org/html/2605.13988#S5.T3)\.

Similar Articles

Hybrid Classical-Quantum Variational Autoencoder for Neural Topic Modeling

arXiv cs.CL

This paper proposes a hybrid classical-quantum variational autoencoder for neural topic modeling, embedding parameterized quantum circuits in the inference network. Experiments on the AgNews dataset demonstrate improved topic coherence and diversity compared to state-of-the-art classical models, showing viability on NISQ-era quantum devices.

Learning Normalized Energy Models for Linear Inverse Problems

arXiv cs.LG

This paper introduces a new energy-based model for linear inverse problems that learns normalized posterior densities, overcoming limitations of diffusion models. It enables unbiased sampling, adaptive sampling, and blind degradation estimation, with competitive performance on ImageNet, CelebA, and AFHQ.

Neuro-Inspired Inverse Learning for Planning and Control

arXiv cs.AI

This paper introduces a neuro-inspired framework called Inverter that uses Inverse Learning (IL) for fast and efficient planning and control, achieving significant improvements on D4RL benchmarks and quantum gate synthesis with orders of magnitude less inference computation.

Sequential Physics-Constrained Neural Operator Forward Modeling for the $\textit{Norne}$ Reservoir System

arXiv cs.LG

This paper presents a comprehensive mathematical framework for sequential surrogate modeling of three-phase black-oil reservoir dynamics using Fourier Neural Operators (FNO) and physics-informed variants (PINO), applied to the Norne benchmark reservoir. Theoretical contributions include functional-analytic formulation, covariate shift analysis, physics-constrained spectral stability, and truncated backpropagation gradient analysis.