Conformal Prediction for Neural Operators: Distribution-Free Uncertainty Quantification in Physics Simulation

arXiv cs.LG 06/10/26, 04:00 AM Papers
Summary
Proposes the first application of split conformal prediction to neural operator-based physics simulation, providing distribution-free prediction intervals with finite-sample coverage guarantees and adaptive-width intervals using MC Dropout uncertainty.
arXiv:2606.09923v1 Announce Type: new Abstract: Neural operators such as the Fourier Neural Operator (FNO) have emerged as powerful surrogates for solving partial differential equations (PDEs), achieving speedups of several orders of magnitude over traditional numerical solvers. However, deploying these models in safety-critical engineering applications -- such as thermal management of electronic components and battery systems -- requires not only accurate point predictions but also rigorous uncertainty guarantees. Existing uncertainty quantification (UQ) methods for neural operators, including Monte Carlo Dropout and Deep Ensembles, provide only relative uncertainty estimates without formal coverage guarantees. In this work, we propose the first application of split conformal prediction to neural operator-based physics simulation, providing distribution-free prediction intervals with finite-sample coverage guarantees. We further introduce a normalized conformal prediction scheme that leverages MC Dropout uncertainty to produce adaptive-width intervals, yielding tighter intervals in regions of low uncertainty and wider intervals where the model is less certain. Full-scale experiments (33.7M parameters, 800 training samples, 5 ensemble members, NVIDIA V100) on steady-state heat conduction benchmarks demonstrate that our method achieves 89.1% empirical coverage at the target level of alpha=0.1, while producing spatially adaptive prediction intervals that reflect the underlying physical uncertainty structure. We also provide an uncertainty decomposition framework that separates epistemic uncertainty (68% of total) from aleatoric uncertainty (32% of total), offering actionable guidance for data collection and model improvement. Our method is implemented in an open-source platform with REST API endpoints and interactive 3D visualization.
Original Article
View Cached Full Text
Cached at: 06/10/26, 06:18 AM
# Distribution-Free Uncertainty Quantification in Physics Simulation
Source: [https://arxiv.org/html/2606.09923](https://arxiv.org/html/2606.09923)
## Conformal Prediction for Neural Operators: Distribution\-Free Uncertainty Quantification in Physics Simulation

\(June 2026\)

###### Abstract

Neural operators such as the Fourier Neural Operator \(FNO\) have emerged as powerful surrogates for solving partial differential equations \(PDEs\), achieving speedups of several orders of magnitude over traditional numerical solvers\. However, deploying these models in safety\-critical engineering applications—such as thermal management of electronic components and battery systems—requires not only accurate point predictions but also*rigorous uncertainty guarantees*\. Existing uncertainty quantification \(UQ\) methods for neural operators, including Monte Carlo Dropout and Deep Ensembles, provide only relative uncertainty estimates without formal coverage guarantees\. In this work, we propose the first application of*split conformal prediction*to neural operator\-based physics simulation, providing distribution\-free prediction intervals with finite\-sample coverage guarantees:ℙ\(Y∈𝒞\(X\)\)≥1−α\\mathbb\{P\}\\bigl\(Y\\in\\mathcal\{C\}\(X\)\\bigr\)\\geq 1\-\\alpha\. We further introduce a*normalized conformal prediction*scheme that leverages MC Dropout uncertainty to produce adaptive\-width intervals, yielding tighter intervals in regions of low uncertainty and wider intervals where the model is less certain\. Full\-scale experiments \(33\.7M parameters, 800 training samples, 5 ensemble members, NVIDIA V100\) on steady\-state heat conduction benchmarks demonstrate that our method achieves 89\.1% empirical coverage at the target level ofα=0\.1\\alpha=0\.1, while producing spatially adaptive prediction intervals that reflect the underlying physical uncertainty structure\. We also provide an uncertainty decomposition framework that separates epistemic uncertainty \(model uncertainty, reducible, 68% of total\) from aleatoric uncertainty \(data noise, irreducible, 32% of total\), offering actionable guidance for data collection and model improvement\. Our method is implemented in an open\-source platform with REST API endpoints and interactive 3D visualization, demonstrating the practical deployability of conformal prediction for industrial physics simulation\.

## 1Introduction

Artificial intelligence is rapidly transforming scientific computing, with learned simulators achieving speedups of10210^\{2\}–104×10^\{4\}\\timesover traditional numerical methods\(Karniadakis et al\.,[2021](https://arxiv.org/html/2606.09923#bib.bib9); Brunton et al\.,[2020](https://arxiv.org/html/2606.09923#bib.bib4)\)\. Physics simulation is a cornerstone of modern engineering design, enabling virtual prototyping of complex systems ranging from electronic cooling to battery thermal management\. Traditional numerical methods—finite element methods \(FEM\), finite volume methods \(FVM\), and computational fluid dynamics \(CFD\)—solve the governing partial differential equations \(PDEs\) by discretizing the domain and iteratively solving large systems of equations\. While accurate, these methods are computationally expensive: a single simulation can take minutes to hours, and design space exploration requiring hundreds of simulations becomes prohibitively costly\.

Neural operators\(Li et al\.,[2021b](https://arxiv.org/html/2606.09923#bib.bib16); Lu et al\.,[2021](https://arxiv.org/html/2606.09923#bib.bib20); Kovachki et al\.,[2023](https://arxiv.org/html/2606.09923#bib.bib12)\)offer a paradigm shift by learning the*mapping operator*from input fields to output fields, rather than solving the PDE for each new configuration\. The Fourier Neural Operator \(FNO\)\(Li et al\.,[2021b](https://arxiv.org/html/2606.09923#bib.bib16)\)performs global convolution in the spectral domain, achieving resolution\-invariant predictions with inference times on the order of milliseconds—a speedup of10210^\{2\}–103×10^\{3\}\\timesover traditional solvers\.

Despite these advances, a critical barrier remains for industrial deployment:*uncertainty quantification*\(UQ\)\. In safety\-critical applications such as battery thermal management, engineers need to know not just the predicted temperature but also the confidence in that prediction\. For instance, a battery thermal management system must guarantee that the maximum cell temperature remains below a safety threshold with high probability, not merely that the mean prediction does so\.

Existing UQ methods for neural networks fall short of this requirement:

- •Monte Carlo Dropout\(Gal and Ghahramani,[2016](https://arxiv.org/html/2606.09923#bib.bib6)\)provides uncertainty estimates by enabling dropout at inference time, but the resulting intervals have no formal coverage guarantees and often underestimate uncertainty\(Ashukha and Vetrov,[2020](https://arxiv.org/html/2606.09923#bib.bib3)\)\.
- •Deep Ensembles\(Lakshminarayanan et al\.,[2017](https://arxiv.org/html/2606.09923#bib.bib13)\)aggregate predictions from multiple independently trained models, but the coverage of their intervals depends on the \(unverifiable\) assumption that the ensemble adequately spans the predictive distribution\.
- •Bayesian Neural Networksprovide principled posterior inference but are computationally expensive and difficult to scale to the parameter counts of neural operators \(∼\\sim30M parameters\)\.

Conformal prediction\(Vovk et al\.,[2005](https://arxiv.org/html/2606.09923#bib.bib34); Shafer and Vovk,[2008](https://arxiv.org/html/2606.09923#bib.bib31); Angelopoulos and Bates,[2022](https://arxiv.org/html/2606.09923#bib.bib1)\)offers an attractive alternative: it provides distribution\-free, finite\-sample coverage guarantees without requiring any assumptions about the data\-generating process beyond exchangeability\. Given a calibration set ofnnexamples, split conformal prediction guarantees that the prediction interval covers the true value with probability at least1−α1\-\\alpha:

ℙ\(Yn\+1∈𝒞\(Xn\+1\)\)≥1−α\.\\mathbb\{P\}\\bigl\(Y\_\{n\+1\}\\in\\mathcal\{C\}\(X\_\{n\+1\}\)\\bigr\)\\geq 1\-\\alpha\.\(1\)
In this paper, we make the following contributions:

1. 1\.We propose the first application of*split conformal prediction*to neural operator\-based physics simulation, providing rigorous coverage guarantees for FNO predictions on PDE solution fields \(Section[4](https://arxiv.org/html/2606.09923#S4)\)\.
2. 2\.We introduce a*normalized conformal prediction*scheme that uses MC Dropout uncertainty estimates to produce adaptive\-width prediction intervals, achieving tighter intervals in well\-constrained regions and wider intervals where the model is uncertain \(Section[4\.2](https://arxiv.org/html/2606.09923#S4.SS2)\)\.
3. 3\.We develop an*uncertainty decomposition*framework that separates epistemic uncertainty \(model uncertainty, reducible with more data\) from aleatoric uncertainty \(data noise, irreducible\), providing actionable guidance for model improvement \(Section[4\.3](https://arxiv.org/html/2606.09923#S4.SS3)\)\.
4. 4\.We validate our approach on industrial physics simulation benchmarks—steady\-state heat conduction—demonstrating 89\.1% empirical coverage at the target level ofα=0\.1\\alpha=0\.1with spatially adaptive intervals, using full\-scale models \(33\.7M parameters\) on NVIDIA V100 \(Section[5](https://arxiv.org/html/2606.09923#S5)\)\.

## 2Related Work

Our work sits at the intersection of three active research areas in AI: neural operator learning, uncertainty quantification, and conformal prediction\.

### 2\.1Neural Operators for PDEs

Neural operators learn mappings between infinite\-dimensional function spaces, enabling resolution\-invariant predictions\(Kovachki et al\.,[2023](https://arxiv.org/html/2606.09923#bib.bib12)\)\. The FNO\(Li et al\.,[2021b](https://arxiv.org/html/2606.09923#bib.bib16)\)parameterizes the integral kernel in Fourier space, achieving global receptive fields withO\(Nlog⁡N\)O\(N\\log N\)complexity\. FNO has been applied to weather forecasting\(Pathak et al\.,[2022](https://arxiv.org/html/2606.09923#bib.bib25)\), turbulence simulation\(Stachenfeld et al\.,[2022](https://arxiv.org/html/2606.09923#bib.bib32)\), and industrial design\(Kasim et al\.,[2022](https://arxiv.org/html/2606.09923#bib.bib10)\)\. Subsequent works extend FNO to multi\-scale problems\(Li et al\.,[2022b](https://arxiv.org/html/2606.09923#bib.bib18)\), temporal dynamics\(Li et al\.,[2022a](https://arxiv.org/html/2606.09923#bib.bib17)\), and irregular geometries\(Li et al\.,[2023](https://arxiv.org/html/2606.09923#bib.bib19)\)\. DeepONet\(Lu et al\.,[2021](https://arxiv.org/html/2606.09923#bib.bib20)\)uses a branch\-trunk architecture, while graph approaches\(Li et al\.,[2021a](https://arxiv.org/html/2606.09923#bib.bib15)\)handle mesh\-based problems\. NVIDIA Modulus\(NVIDIA,[2024](https://arxiv.org/html/2606.09923#bib.bib23)\)provides an industrial framework for large\-scale simulation\.

### 2\.2Uncertainty Quantification in Deep Learning

MC Dropout\(Gal and Ghahramani,[2016](https://arxiv.org/html/2606.09923#bib.bib6)\)interprets dropout as approximate Bayesian inference\. Deep Ensembles\(Lakshminarayanan et al\.,[2017](https://arxiv.org/html/2606.09923#bib.bib13)\)train multiple models with different initializations\.Kendall and Gal \([2017](https://arxiv.org/html/2606.09923#bib.bib11)\)distinguished epistemic from aleatoric uncertainty, a decomposition we adopt\.Hüllermeier and Waegeman \([2021](https://arxiv.org/html/2606.09923#bib.bib8)\)provide a comprehensive taxonomy of uncertainty types\.Ashukha and Vetrov \([2020](https://arxiv.org/html/2606.09923#bib.bib3)\)showed that MC Dropout often underestimates uncertainty, whileRahaman et al\. \([2021](https://arxiv.org/html/2606.09923#bib.bib27)\)analyzed ensemble diversity\-accuracy trade\-offs\.

In physics simulation, UQ for neural operators remains largely unexplored\.Moli et al\. \([2023](https://arxiv.org/html/2606.09923#bib.bib22)\)proposed normalizing flows for neural operator UQ, requiring distributional assumptions\.Psaros et al\. \([2023](https://arxiv.org/html/2606.09923#bib.bib26)\)surveyed UQ for physics\-informed learning but noted the lack of formal coverage guarantees\.Mishra and Molinaro \([2022](https://arxiv.org/html/2606.09923#bib.bib21)\)applied conformal prediction to standard neural networks for PDEs but did not consider neural operators or spatially adaptive intervals\.

### 2\.3Conformal Prediction

Conformal prediction\(Vovk et al\.,[2005](https://arxiv.org/html/2606.09923#bib.bib34)\)provides distribution\-free prediction sets with finite\-sample coverage guarantees under exchangeability\. The split conformal method\(Papadopoulos et al\.,[2002](https://arxiv.org/html/2606.09923#bib.bib24); Lei et al\.,[2018](https://arxiv.org/html/2606.09923#bib.bib14)\)uses a held\-out calibration set, making it efficient and easy to implement\.Romano et al\. \([2019](https://arxiv.org/html/2606.09923#bib.bib28)\)introduced conformalized quantile regression for adaptive intervals, andRomano et al\. \([2020](https://arxiv.org/html/2606.09923#bib.bib29)\)proposed distribution\-free uncertainty sets\.Sesia and Candès \([2020](https://arxiv.org/html/2606.09923#bib.bib30)\)compared conformal quantile regression methods\.Angelopoulos and Bates \([2022](https://arxiv.org/html/2606.09923#bib.bib1)\)provided a comprehensive introduction, andFontana et al\. \([2023](https://arxiv.org/html/2606.09923#bib.bib5)\)reviewed theory and open challenges\.

Conformal prediction has been applied to image segmentation\(Angelopoulos et al\.,[2021](https://arxiv.org/html/2606.09923#bib.bib2)\), time series\(Xu and Xie,[2021](https://arxiv.org/html/2606.09923#bib.bib35)\), and molecular property prediction\(Sun et al\.,[2022](https://arxiv.org/html/2606.09923#bib.bib33)\)\.Gibbs and Candes \([2021](https://arxiv.org/html/2606.09923#bib.bib7)\)extended conformal prediction to distribution shift settings\. However, to our knowledge,*no prior work has applied conformal prediction to neural operator\-based physics simulation*\.

## 3Preliminaries

### 3\.1Problem Setup

We consider the problem of learning a solution operator for parametric PDEs\. Let𝒟⊂ℝd\\mathcal\{D\}\\subset\\mathbb\{R\}^\{d\}be a bounded domain and𝒜\\mathcal\{A\}a parameter space\. Given a parameter fielda∈𝒜a\\in\\mathcal\{A\}\(e\.g\., thermal conductivity, heat source\), the PDE solutionu∈𝒱u\\in\\mathcal\{V\}satisfies:

𝒩\(u;a\)=0in𝒟,\\mathcal\{N\}\(u;a\)=0\\quad\\text\{in \}\\mathcal\{D\},\(2\)where𝒩\\mathcal\{N\}is the differential operator and𝒱\\mathcal\{V\}is the solution space\. The goal is to learn the solution operator𝒢†:𝒜→𝒱\\mathcal\{G\}^\{\\dagger\}:\\mathcal\{A\}\\to\\mathcal\{V\}mapping parameters to solutions\.

In practice, we discretize the domain on a grid\{x1,…,xN\}⊂𝒟\\\{x\_\{1\},\\ldots,x\_\{N\}\\\}\\subset\\mathcal\{D\}and represent the parameter and solution fields as tensorsA∈ℝcin×H×WA\\in\\mathbb\{R\}^\{c\_\{\\text\{in\}\}\\times H\\times W\}andU∈ℝcout×H×WU\\in\\mathbb\{R\}^\{c\_\{\\text\{out\}\}\\times H\\times W\}, wherecinc\_\{\\text\{in\}\}andcoutc\_\{\\text\{out\}\}are the number of input and output channels, andH×WH\\times Wis the spatial resolution\.

### 3\.2Fourier Neural Operator

The Fourier Neural Operator \(FNO\)\(Li et al\.,[2021b](https://arxiv.org/html/2606.09923#bib.bib16)\)approximates the solution operator𝒢†\\mathcal\{G\}^\{\\dagger\}through a sequence of spectral convolution layers\. Each layerllcomputes:

v\(l\+1\)=σ\(Wlv\(l\)\+𝒦l\(v\(l\)\)\+bl\),v^\{\(l\+1\)\}=\\sigma\\Bigl\(W\_\{l\}v^\{\(l\)\}\+\\mathcal\{K\}\_\{l\}\(v^\{\(l\)\}\)\+b\_\{l\}\\Bigr\),\(3\)whereWlW\_\{l\}is a local linear transformation,blb\_\{l\}is a bias,σ\\sigmais a nonlinear activation \(GELU\), and𝒦l\\mathcal\{K\}\_\{l\}is the spectral convolution operator:

𝒦l\(v\)=ℱ−1\(Rl⋅ℱ\(v\)\)\.\\mathcal\{K\}\_\{l\}\(v\)=\\mathcal\{F\}^\{\-1\}\\Bigl\(R\_\{l\}\\cdot\\mathcal\{F\}\(v\)\\Bigr\)\.\(4\)Here,ℱ\\mathcal\{F\}andℱ−1\\mathcal\{F\}^\{\-1\}denote the 2D FFT and its inverse, andRl∈ℂcmid×cmid×kmax×kmaxR\_\{l\}\\in\\mathbb\{C\}^\{c\_\{\\text\{mid\}\}\\times c\_\{\\text\{mid\}\}\\times k\_\{\\max\}\\times k\_\{\\max\}\}are learnable complex weights applied only to the lowestkmaxk\_\{\\max\}Fourier modes \(low\-pass filtering\)\.

The full architecture is:

U^=𝒫∘L∘⋯∘L∘ℒ\(A\),\\hat\{U\}=\\mathcal\{P\}\\circ L\\circ\\cdots\\circ L\\circ\\mathcal\{L\}\(A\),\(5\)whereℒ:ℝcin→ℝw\\mathcal\{L\}:\\mathbb\{R\}^\{c\_\{\\text\{in\}\}\}\\to\\mathbb\{R\}^\{w\}is the lifting layer,𝒫:ℝw→ℝcout\\mathcal\{P\}:\\mathbb\{R\}^\{w\}\\to\\mathbb\{R\}^\{c\_\{\\text\{out\}\}\}is the projection layer, andLLdenotes the Fourier layer \([3](https://arxiv.org/html/2606.09923#S3.E3)\)\.

For UQ via MC Dropout, we introduce dropout layers after each spectral convolution:

v\(l\+1\)=σ\(Dropout\(Wlv\(l\)\+𝒦l\(v\(l\)\)\)\+bl\),v^\{\(l\+1\)\}=\\sigma\\Bigl\(\\text\{Dropout\}\\bigl\(W\_\{l\}v^\{\(l\)\}\+\\mathcal\{K\}\_\{l\}\(v^\{\(l\)\}\)\\bigr\)\+b\_\{l\}\\Bigr\),\(6\)where Dropout is applied with probabilityppduring both training and inference \(for MC sampling\)\.

### 3\.3Split Conformal Prediction

Split conformal prediction\(Papadopoulos et al\.,[2002](https://arxiv.org/html/2606.09923#bib.bib24)\)constructs prediction intervals with distribution\-free coverage guarantees\. Given:

- •A trained modelf:ℝd→ℝf:\\mathbb\{R\}^\{d\}\\to\\mathbb\{R\}producing point predictionsy^=f\(x\)\\hat\{y\}=f\(x\),
- •A calibration set\{\(xi,yi\)\}i=1n\\\{\(x\_\{i\},y\_\{i\}\)\\\}\_\{i=1\}^\{n\}drawn exchangeably from the data distribution,
- •A significance levelα∈\(0,1\)\\alpha\\in\(0,1\),

the procedure is:

1. 1\.Compute nonconformity scoressi=\|yi−f\(xi\)\|s\_\{i\}=\|y\_\{i\}\-f\(x\_\{i\}\)\|for each calibration example\.
2. 2\.Compute the adjusted quantileq^=Quantile\(\{s1,…,sn\};⌈\(n\+1\)\(1−α\)⌉/n\)\\hat\{q\}=\\text\{Quantile\}\\bigl\(\\\{s\_\{1\},\\ldots,s\_\{n\}\\\};\\lceil\(n\+1\)\(1\-\\alpha\)\\rceil/n\\bigr\)\.
3. 3\.For a new inputxn\+1x\_\{n\+1\}, output the prediction interval: 𝒞\(xn\+1\)=\[f\(xn\+1\)−q^,f\(xn\+1\)\+q^\]\.\\mathcal\{C\}\(x\_\{n\+1\}\)=\\bigl\[f\(x\_\{n\+1\}\)\-\\hat\{q\},\\;f\(x\_\{n\+1\}\)\+\\hat\{q\}\\bigr\]\.\(7\)

###### Theorem 1\(Coverage Guarantee\(Vovk et al\.,[2005](https://arxiv.org/html/2606.09923#bib.bib34)\)\)\.

If\(X1,Y1\),…,\(Xn\+1,Yn\+1\)\(X\_\{1\},Y\_\{1\}\),\\ldots,\(X\_\{n\+1\},Y\_\{n\+1\}\)are exchangeable, then the split conformal prediction interval satisfies:

ℙ\(Yn\+1∈𝒞\(Xn\+1\)\)≥1−α\.\\mathbb\{P\}\\bigl\(Y\_\{n\+1\}\\in\\mathcal\{C\}\(X\_\{n\+1\}\)\\bigr\)\\geq 1\-\\alpha\.\(8\)Furthermore, if the nonconformity scores have no ties, the coverage is exactly1−α1\-\\alphain the limitn→∞n\\to\\infty\.

###### Theorem 2\(Finite\-Sample Coverage Gap\(Angelopoulos and Bates,[2022](https://arxiv.org/html/2606.09923#bib.bib1)\)\)\.

For a calibration set of sizenn, the empirical coverage satisfies:

ℙ\(Yn\+1∈𝒞\(Xn\+1\)\)≥1−α−1n\+1\.\\mathbb\{P\}\\bigl\(Y\_\{n\+1\}\\in\\mathcal\{C\}\(X\_\{n\+1\}\)\\bigr\)\\geq 1\-\\alpha\-\\frac\{1\}\{n\+1\}\.\(9\)

This gap of1/\(n\+1\)1/\(n\+1\)explains why our empirical coverage \(89\.1% withn=200n=200\) is close to but slightly below the 90% target—the theoretical bound requiresn≥1/α=10n\\geq 1/\\alpha=10samples, but tighter coverage needs larger calibration sets\.

The key advantage of conformal prediction is that*no distributional assumptions*are required—the coverage guarantee holds for any data distribution, as long as the calibration and test data are exchangeable\.

## 4Method

We present our method for applying conformal prediction to neural operator\-based physics simulation\. The key challenge is that neural operators produce*spatially distributed*predictions \(2D fields\), whereas standard conformal prediction is designed for scalar outputs\. We address this by applying conformal prediction pixel\-wise and introducing a normalized scheme for adaptive intervals\.

### 4\.1Conformal Prediction for Neural Operators

Letfθ:ℝcin×H×W→ℝcout×H×Wf\_\{\\theta\}:\\mathbb\{R\}^\{c\_\{\\text\{in\}\}\\times H\\times W\}\\to\\mathbb\{R\}^\{c\_\{\\text\{out\}\}\\times H\\times W\}be a trained FNO model\. Given a calibration set\{\(Ai,Ui\)\}i=1n\\\{\(A\_\{i\},U\_\{i\}\)\\\}\_\{i=1\}^\{n\}of input–output field pairs, we compute pixel\-wise nonconformity scores:

Si=\|Ui−fθ\(Ai\)\|∈ℝcout×H×W,i=1,…,n\.S\_\{i\}=\|U\_\{i\}\-f\_\{\\theta\}\(A\_\{i\}\)\|\\in\\mathbb\{R\}^\{c\_\{\\text\{out\}\}\\times H\\times W\},\\quad i=1,\\ldots,n\.\(10\)
To obtain a single quantile value, we flatten all spatial locations and channels into a single set of scores:

𝒮=\{Si\[h,w\]:i=1,…,n;h=1,…,H;w=1,…,W\}\.\\mathcal\{S\}=\\\{S\_\{i\}\[h,w\]:i=1,\\ldots,n;\\\\ h=1,\\ldots,H;\\;w=1,\\ldots,W\\\}\.\(11\)The conformal quantile is then:

q^=Quantile\(𝒮;⌈\(\|𝒮\|\+1\)\(1−α\)⌉\|𝒮\|\)\.\\hat\{q\}=\\text\{Quantile\}\\left\(\\mathcal\{S\};\\;\\frac\{\\lceil\(\|\\mathcal\{S\}\|\+1\)\(1\-\\alpha\)\\rceil\}\{\|\\mathcal\{S\}\|\}\\right\)\.\(12\)
For a new inputAn\+1A\_\{n\+1\}, the prediction interval at each spatial location is:

𝒞\(An\+1\)\[h,w\]=\[U^n\+1\[h,w\]−q^,U^n\+1\[h,w\]\+q^\],\\mathcal\{C\}\(A\_\{n\+1\}\)\[h,w\]=\\bigl\[\\hat\{U\}\_\{n\+1\}\[h,w\]\-\\hat\{q\},\\;\\hat\{U\}\_\{n\+1\}\[h,w\]\+\\hat\{q\}\\bigr\],\(13\)whereU^n\+1=fθ\(An\+1\)\\hat\{U\}\_\{n\+1\}=f\_\{\\theta\}\(A\_\{n\+1\}\)\.

### 4\.2Normalized Conformal Prediction

The standard \(unnormalized\) conformal interval \([13](https://arxiv.org/html/2606.09923#S4.E13)\) produces constant\-width intervals across the entire spatial domain\. This is suboptimal for physics simulations, where uncertainty varies spatially: regions near heat sources or material interfaces typically have higher prediction error than smooth interior regions\.

We introduce a*normalized conformal prediction*scheme that produces adaptive\-width intervals\. The key idea is to use MC Dropout uncertainty as a normalizing factor for the nonconformity scores\.

LetσMC\(A\)=Varp∼Dropout\[fθ\(A;p\)\]\\sigma\_\{\\text\{MC\}\}\(A\)=\\sqrt\{\\text\{Var\}\_\{p\\sim\\text\{Dropout\}\}\[f\_\{\\theta\}\(A;p\)\]\}be the MC Dropout standard deviation at each spatial location\. We define the normalized nonconformity score as:

S~i\[h,w\]=\|Ui\[h,w\]−fθ\(Ai\)\[h,w\]\|σMC\(Ai\)\[h,w\]\+ϵ,\\tilde\{S\}\_\{i\}\[h,w\]=\\frac\{\|U\_\{i\}\[h,w\]\-f\_\{\\theta\}\(A\_\{i\}\)\[h,w\]\|\}\{\\sigma\_\{\\text\{MC\}\}\(A\_\{i\}\)\[h,w\]\+\\epsilon\},\(14\)whereϵ\>0\\epsilon\>0is a small constant for numerical stability\. The normalized quantileq^norm\\hat\{q\}\_\{\\text\{norm\}\}is computed from\{S~i\[h,w\]\}\\\{\\tilde\{S\}\_\{i\}\[h,w\]\\\}as in \([12](https://arxiv.org/html/2606.09923#S4.E12)\)\.

The resulting adaptive prediction interval is:

𝒞norm\(An\+1\)\[h,w\]=\[U^\[h,w\]−q^n⋅σMC\[h,w\],U^\[h,w\]\+q^n⋅σMC\[h,w\]\],\\mathcal\{C\}\_\{\\text\{norm\}\}\(A\_\{n\+1\}\)\[h,w\]=\\bigl\[\\hat\{U\}\[h,w\]\-\\hat\{q\}\_\{\\text\{n\}\}\\cdot\\sigma\_\{\\text\{MC\}\}\[h,w\],\\\\ \\hat\{U\}\[h,w\]\+\\hat\{q\}\_\{\\text\{n\}\}\\cdot\\sigma\_\{\\text\{MC\}\}\[h,w\]\\bigr\],\(15\)
This produces intervals that are wider where MC Dropout uncertainty is high \(e\.g\., near heat sources, material interfaces\) and narrower where the model is confident \(e\.g\., smooth interior regions\)\.

###### Proposition 3\(Normalized Coverage\)\.

Under the same exchangeability assumption as Theorem[1](https://arxiv.org/html/2606.09923#Thmtheorem1), the normalized conformal prediction interval satisfies:

ℙ\(Yn\+1∈𝒞norm\(Xn\+1\)\)≥1−α\.\\mathbb\{P\}\\bigl\(Y\_\{n\+1\}\\in\\mathcal\{C\}\_\{\\text\{norm\}\}\(X\_\{n\+1\}\)\\bigr\)\\geq 1\-\\alpha\.\(16\)

###### Proposition 4\(Computational Overhead\)\.

Given a trained FNO modelfθf\_\{\\theta\}with\|θ\|\|\\theta\|parameters, the additional computational cost of conformal prediction is:

- •Calibration:O\(n⋅NMC⋅Tfwd\)O\(n\\cdot N\_\{\\text\{MC\}\}\\cdot T\_\{\\text\{fwd\}\}\)whereTfwdT\_\{\\text\{fwd\}\}is a single forward pass time \(one\-time cost\)\.
- •Inference:O\(NMC⋅Tfwd\)O\(N\_\{\\text\{MC\}\}\\cdot T\_\{\\text\{fwd\}\}\)for normalized conformal \(vs\.O\(Tfwd\)O\(T\_\{\\text\{fwd\}\}\)for plain prediction\), a factor ofNMCN\_\{\\text\{MC\}\}overhead\.
- •Memory: No additional model storage \(MC Dropout reusesfθf\_\{\\theta\}\) vs\.O\(M⋅\|θ\|\)O\(M\\cdot\|\\theta\|\)for Deep Ensembles\.

This shows that conformal prediction is significantly more memory\-efficient than Deep Ensembles \(M=5M\\\!=\\\!5requires5×5\\timesmodel storage\), with the main overhead beingNMCN\_\{\\text\{MC\}\}forward passes at inference time\.

###### Corollary 5\(Simultaneous Spatial Coverage\)\.

ForKKspatial locations, applying a Bonferroni correction withαbon=α/K\\alpha\_\{\\text\{bon\}\}=\\alpha/Kguarantees:

ℙ\(∀\(h,w\):Y\[h,w\]∈𝒞\(X\)\[h,w\]\)≥1−α\.\\mathbb\{P\}\\bigl\(\\forall\\,\(h,w\):Y\[h,w\]\\in\\mathcal\{C\}\(X\)\[h,w\]\\bigr\)\\geq 1\-\\alpha\.\(17\)However, this is conservative; in practice, pixel\-wise coverage at level1−α1\-\\alphais sufficient for engineering applications where safety factors already account for spatial variability\.

### 4\.3Uncertainty Decomposition

To provide actionable guidance for model improvement, we decompose the total predictive uncertainty into epistemic and aleatoric components\.

Epistemic uncertainty\(model uncertainty\) captures the reducible uncertainty due to limited training data or model capacity\. We estimate it using MC Dropout variance:

σepi2\(x\)=Varp∼Dropout\[fθ\(x;p\)\]\.\\sigma^\{2\}\_\{\\text\{epi\}\}\(x\)=\\text\{Var\}\_\{p\\sim\\text\{Dropout\}\}\\bigl\[f\_\{\\theta\}\(x;p\)\\bigr\]\.\(18\)
Aleatoric uncertainty\(data noise\) captures the irreducible uncertainty due to noise in the data or inherent stochasticity in the physical process\. We estimate it as the residual between total ensemble variance and epistemic variance:

σalea2\(x\)=\(σens2\(x\)−σepi2\(x\)\)\+,\\sigma^\{2\}\_\{\\text\{alea\}\}\(x\)=\\bigl\(\\sigma^\{2\}\_\{\\text\{ens\}\}\(x\)\-\\sigma^\{2\}\_\{\\text\{epi\}\}\(x\)\\bigr\)\_\{\+\},\(19\)whereσens2\(x\)=Varj=1,…,M\[fθj\(x\)\]\\sigma^\{2\}\_\{\\text\{ens\}\}\(x\)=\\text\{Var\}\_\{j=1,\\ldots,M\}\[f\_\{\\theta\_\{j\}\}\(x\)\]is the Deep Ensemble variance and\(⋅\)\+=max⁡\(⋅,0\)\(\\cdot\)\_\{\+\}=\\max\(\\cdot,0\)\.

Total uncertaintyis the sum:

σtotal2\(x\)=σepi2\(x\)\+σalea2\(x\)\.\\sigma^\{2\}\_\{\\text\{total\}\}\(x\)=\\sigma^\{2\}\_\{\\text\{epi\}\}\(x\)\+\\sigma^\{2\}\_\{\\text\{alea\}\}\(x\)\.\(20\)
This decomposition is practically valuable: high epistemic uncertainty suggests that collecting more training data in that region would improve the model, while high aleatoric uncertainty indicates inherent noise that cannot be reduced\(Kendall and Gal,[2017](https://arxiv.org/html/2606.09923#bib.bib11); Hüllermeier and Waegeman,[2021](https://arxiv.org/html/2606.09923#bib.bib8)\)\.

### 4\.4Algorithm Summary

We summarize the full procedure in Algorithm[1](https://arxiv.org/html/2606.09923#alg1)\.

Algorithm 1Conformal Prediction for Neural Operators0:Trained FNO model

fθf\_\{\\theta\}with dropout; calibration set

\{\(Ai,Ui\)\}i=1n\\\{\(A\_\{i\},U\_\{i\}\)\\\}\_\{i=1\}^\{n\}; significance level

α\\alpha; MC samples

NMCN\_\{\\text\{MC\}\}
0:Prediction intervals for new inputs

1:Phase 1: Calibration

2:for

i=1,…,ni=1,\\ldots,ndo

3:

U^i←fθ\(Ai\)\\hat\{U\}\_\{i\}\\leftarrow f\_\{\\theta\}\(A\_\{i\}\)\{Point prediction\}

4:

σMC,i←1NMC∑k=1NMC\(fθ\(Ai;pk\)−U¯i\)2\\sigma\_\{\\text\{MC\},i\}\\leftarrow\\sqrt\{\\frac\{1\}\{N\_\{\\text\{MC\}\}\}\\sum\_\{k=1\}^\{N\_\{\\text\{MC\}\}\}\(f\_\{\\theta\}\(A\_\{i\};p\_\{k\}\)\-\\bar\{U\}\_\{i\}\)^\{2\}\}\{MC Dropout std\}

5:

Si←\|Ui−U^i\|S\_\{i\}\\leftarrow\|U\_\{i\}\-\\hat\{U\}\_\{i\}\|\{Residual scores\}

6:

S~i←Si/\(σMC,i\+ϵ\)\\tilde\{S\}\_\{i\}\\leftarrow S\_\{i\}/\(\\sigma\_\{\\text\{MC\},i\}\+\\epsilon\)\{Normalized scores\}

7:endfor

8:

q^←Quantile\(\{Si\};⌈\(n\+1\)\(1−α\)⌉/n\)\\hat\{q\}\\leftarrow\\text\{Quantile\}\(\\\{S\_\{i\}\\\};\\lceil\(n\+1\)\(1\-\\alpha\)\\rceil/n\)
9:

q^norm←Quantile\(\{S~i\};⌈\(n\+1\)\(1−α\)⌉/n\)\\hat\{q\}\_\{\\text\{norm\}\}\\leftarrow\\text\{Quantile\}\(\\\{\\tilde\{S\}\_\{i\}\\\};\\lceil\(n\+1\)\(1\-\\alpha\)\\rceil/n\)
10:Phase 2: Prediction

11:

U^new←fθ\(Anew\)\\hat\{U\}\_\{\\text\{new\}\}\\leftarrow f\_\{\\theta\}\(A\_\{\\text\{new\}\}\)
12:

σMC,new←\\sigma\_\{\\text\{MC,new\}\}\\leftarrowMC Dropout std for

AnewA\_\{\\text\{new\}\}
13:

𝒞\(Anew\)=\[U^new−q^,U^new\+q^\]\\mathcal\{C\}\(A\_\{\\text\{new\}\}\)=\[\\hat\{U\}\_\{\\text\{new\}\}\-\\hat\{q\},\\;\\hat\{U\}\_\{\\text\{new\}\}\+\\hat\{q\}\]\{Constant\-width\}

14:

𝒞norm\(Anew\)=\[U^new−q^norm⋅σMC,new,U^new\+q^norm⋅σMC,new\]\\mathcal\{C\}\_\{\\text\{norm\}\}\(A\_\{\\text\{new\}\}\)=\[\\hat\{U\}\_\{\\text\{new\}\}\-\\hat\{q\}\_\{\\text\{norm\}\}\\cdot\\sigma\_\{\\text\{MC,new\}\},\\;\\hat\{U\}\_\{\\text\{new\}\}\+\\hat\{q\}\_\{\\text\{norm\}\}\\cdot\\sigma\_\{\\text\{MC,new\}\}\]\{Adaptive\-width\}

15:return

𝒞\(Anew\)\\mathcal\{C\}\(A\_\{\\text\{new\}\}\),

𝒞norm\(Anew\)\\mathcal\{C\}\_\{\\text\{norm\}\}\(A\_\{\\text\{new\}\}\)

## 5Experiments

### 5\.1Experimental Setup

#### 5\.1\.1Benchmarks

We evaluate on two physics simulation benchmarks:

Heat\-2D: Steady\-State Heat Conduction\.We consider the 2D steady\-state heat equation:

−∇⋅\(k\(𝒙\)∇T\(𝒙\)\)=Q\(𝒙\),𝒙∈Ω,\-\\nabla\\cdot\(k\(\\bm\{x\}\)\\nabla T\(\\bm\{x\}\)\)=Q\(\\bm\{x\}\),\\quad\\bm\{x\}\\in\\Omega,\(21\)with Robin boundary conditions−k∂T/∂n=h\(T−Tamb\)\-k\\partial T/\\partial n=h\(T\-T\_\{\\text\{amb\}\}\)on∂Ω\\partial\\Omega\. The input fieldA=\[k,Q,h\]∈ℝ3×H×WA=\[k,Q,h\]\\in\\mathbb\{R\}^\{3\\times H\\times W\}consists of spatially varying thermal conductivity, heat source, and convection coefficient\. The output is the temperature fieldT∈ℝ1×H×WT\\in\\mathbb\{R\}^\{1\\times H\\times W\}\.

We use three configurations of increasing complexity:

- •simple\_chip: Single heat source on uniform substrate\.
- •forced\_convection: Multiple heat sources with varying convection\.
- •multi\_chip: Multiple heat sources on heterogeneous substrate\.

Battery\-Thermal: Battery Thermal Management\.We consider the battery thermal model:

−∇⋅\(k∇T\)=Qgen\(R,I\)−hconv\(T−Tamb\),\-\\nabla\\cdot\(k\\nabla T\)=Q\_\{\\text\{gen\}\}\(R,I\)\-h\_\{\\text\{conv\}\}\(T\-T\_\{\\text\{amb\}\}\),\(22\)whereQgen=I2RQ\_\{\\text\{gen\}\}=I^\{2\}Ris the Joule heating andhconvh\_\{\\text\{conv\}\}is the convective cooling coefficient\. The input fieldA=\[k,Qgen,R,hconv\]∈ℝ4×H×WA=\[k,Q\_\{\\text\{gen\}\},R,h\_\{\\text\{conv\}\}\]\\in\\mathbb\{R\}^\{4\\times H\\times W\}and outputT∈ℝ1×H×WT\\in\\mathbb\{R\}^\{1\\times H\\times W\}\. We evaluate five scenarios: normal discharge, fast charge, extreme cold, thermal abuse, and overcharge\.

#### 5\.1\.2Data Generation

Training data is generated using finite difference solvers: theHeatEquation2dsolver for heat conduction and theBatteryThermalModelsolver for battery thermal scenarios\. Each solver produces input–output pairs by sampling random parameter configurations \(conductivity fields, heat source distributions, boundary conditions\) and solving the PDE to convergence\.

#### 5\.1\.3Model Configuration

We use FNO2d with the following configuration \(determined via hyperparameter search\):

- •Width:w=128w=128, Fourier modes:kmax=16k\_\{\\max\}=16, Layers:L=4L=4
- •Activation: GELU, Dropout rate:p=0\.05p=0\.05
- •Total parameters:∼\\sim33\.7M
- •Resolution:32×3232\\times 32, Training samples: 800, Ensemble size:M=5M=5

#### 5\.1\.4Training Protocol

Models are trained using the PhysicsTrainer with composite loss:

ℒ=ℒdata\+λPDE\(t\)⋅ℒPDE\+λgrad⋅ℒgrad,\\mathcal\{L\}=\\mathcal\{L\}\_\{\\text\{data\}\}\+\\lambda\_\{\\text\{PDE\}\}\(t\)\\cdot\\mathcal\{L\}\_\{\\text\{PDE\}\}\+\\lambda\_\{\\text\{grad\}\}\\cdot\\mathcal\{L\}\_\{\\text\{grad\}\},\(23\)whereℒdata\\mathcal\{L\}\_\{\\text\{data\}\}is the relativeL2L\_\{2\}data fidelity loss,ℒPDE\\mathcal\{L\}\_\{\\text\{PDE\}\}is the PDE residual loss, andℒgrad\\mathcal\{L\}\_\{\\text\{grad\}\}is the gradient penalty\. The PDE weight follows a cosine schedule:λPDE\(t\)=λend2\(1−cos⁡\(πt/T\)\)\\lambda\_\{\\text\{PDE\}\}\(t\)=\\frac\{\\lambda\_\{\\text\{end\}\}\}\{2\}\(1\-\\cos\(\\pi t/T\)\), ramping from 0 to 0\.05 over training\. The gradient penalty weight isλgrad=0\.01\\lambda\_\{\\text\{grad\}\}=0\.01\.

We use AdamW optimizer with cosine learning rate scheduling, initial learning rate5×10−45\\times 10^\{\-4\}, batch size 8, and early stopping with patience 30\.

#### 5\.1\.5Data Splits

For conformal prediction, we split the data into three sets:

- •Training: 60% \(800 samples\)
- •Calibration: 20% \(200 samples, for conformal calibration\)
- •Test: 20% \(100 samples, for evaluation\)

#### 5\.1\.6Baselines

We compare against the following UQ baselines:

- •MC Dropout:NMC=50N\_\{\\text\{MC\}\}=50forward passes with dropout enabled;±2σ\\pm 2\\sigmainterval\.
- •Deep Ensemble:M=5M=5independently trained FNO2d models;±2σ\\pm 2\\sigmainterval\.
- •Naive Conformal: Split conformal with residual \(unnormalized\) scores\.
- •Ours \(Norm\. Conformal\): Normalized conformal prediction using MC Dropout uncertainty\.

#### 5\.1\.7Evaluation Metrics

- •Coverage: Fraction of test pixels whereU\[h,w\]∈𝒞\(A\)\[h,w\]U\[h,w\]\\in\\mathcal\{C\}\(A\)\[h,w\]\.
- •Average Width: Mean of\|𝒞\(A\)\[h,w\]\|\|\\mathcal\{C\}\(A\)\[h,w\]\|across all test pixels\.
- •Sharpness: Standard deviation of interval widths \(measures adaptivity\)\.
- •Winkler Score:WS=Width\+2α\(y−upper\)\+\+2α\(lower−y\)\+\\text\{WS\}=\\text\{Width\}\+\\frac\{2\}\{\\alpha\}\(y\-\\text\{upper\}\)\_\{\+\}\+\\frac\{2\}\{\\alpha\}\(\\text\{lower\}\-y\)\_\{\+\}, combining width and coverage\.
- •CRPS: Continuous Ranked Probability Score \(approximated from MC samples\)\.

### 5\.2Main Results

#### 5\.2\.1Prediction Accuracy

Table[1](https://arxiv.org/html/2606.09923#S5.T1)reports the prediction accuracy of FNO2d with full\-scale UQ configuration \(width=128, 800 training samples, 200 epochs, NVIDIA V100\)\.

Table 1:FNO2d accuracy \(rel\.L2L\_\{2\}error, width=128, 800 samples,M=5M\\\!=\\\!5\)\.∗Separate full\-scale training without UQ\.

The model achieves\>\>95% accuracy on all heat conduction configurations, and the ensemble members show consistent accuracy \(95\.48–95\.78%\), demonstrating that the UQ configuration maintains high prediction quality\.

#### 5\.2\.2Coverage Analysis

Table[2](https://arxiv.org/html/2606.09923#S5.T2)presents the main UQ comparison results on the Heat\-2D benchmark atα=0\.1\\alpha=0\.1\(target coverage: 90%\)\. All results are from the full\-scale model \(width=128, 800 training samples,M=5M=5ensemble,NMC=50N\_\{\\text\{MC\}\}=50\) trained on NVIDIA V100 for 669 seconds\.

Table 2:Uncertainty quantification comparison on Heat\-2D \(simple\_chip,α=0\.1\\alpha=0\.1\)\. Coverage target:≥90%\\geq 90\\%\. Best coverage inbold\.M=5M=5ensemble,NMC=50N\_\{\\text\{MC\}\}=50, 33\.7M parameters, 800 training samples\.Key observations:

1. 1\.MC Dropout undercovers\(77\.23% vs\. 90% target\), though the full\-scale model achieves much better coverage than small\-scale models due to better calibration of the±2σ\\pm 2\\sigmainterval with higher accuracy\.
2. 2\.Deep Ensemble withM=5M=5undercovers\(52\.44%\), as even 5 independently trained high\-accuracy models produce overly narrow±2σ\\pm 2\\sigmaintervals\. This confirms that ensemble\-based UQ requires many more members for reliable coverage at high accuracy levels\.
3. 3\.Both conformal methods achieve near\-target coverage: Naive conformal achieves 88\.88% and normalized conformal achieves 89\.11%, both close to the 90% target\. The slight undercoverage \(within 1–2%\) is within the expected finite\-sample variance for the calibration set size \(n=200n=200\)\.
4. 4\.Normalized conformal slightly outperforms naive conformal\(89\.11% vs\. 88\.88%\) while producing spatially adaptive intervals, confirming the advantage of our method\.

#### 5\.2\.3Computational Cost Analysis

Table[3](https://arxiv.org/html/2606.09923#S5.T3)compares the inference time and memory overhead of each UQ method on NVIDIA V100\.

Table 3:Inference time and memory comparison \(32×3232\\\!\\times\\\!32resolution, batch size 1, V100\)\.Normalized conformal prediction requiresNMC=50N\_\{\\text\{MC\}\}=50forward passes \(same as MC Dropout\), but achieves much higher coverage \(89\.1% vs\. 77\.2%\) with the same memory footprint\. Deep Ensembles require5×5\\timesmodel storage, which becomes prohibitive for large models\. The calibration phase is a one\-time cost \(O\(n⋅NMC\)O\(n\\cdot N\_\{\\text\{MC\}\}\)fornncalibration samples\) that does not affect inference latency\.

#### 5\.2\.4Interval Width Analysis

Figure[1](https://arxiv.org/html/2606.09923#S5.F1)illustrates the spatial distribution of prediction interval widths for the normalized conformal method compared to naive conformal\. The normalized method produces wider intervals near heat sources and material interfaces \(where MC Dropout uncertainty is high\) and narrower intervals in smooth interior regions, closely matching the spatial structure of the actual prediction errors\.

![Refer to caption](https://arxiv.org/html/2606.09923v1/x1.png)

Figure 1:Spatial distribution of prediction interval widths\.Left: Naive conformal \(constant width\)\.Right: Normalized conformal \(adaptive width\)\. The normalized method concentrates interval width in regions of high uncertainty \(near heat sources\)\.
#### 5\.2\.5Uncertainty Decomposition

Table[4](https://arxiv.org/html/2606.09923#S5.T4)reports the uncertainty decomposition results from the full\-scale UQ pipeline\.

Table 4:Uncertainty decomposition \(simple\_chip, full\-scale\)\.The decomposition reveals that68% of the total uncertainty is epistemicat full scale, in contrast to the 70% aleatoric dominance observed at small scale\. This reversal is expected: the full\-scale model achieves 95\.89% accuracy, reducing data\-fitting errors \(aleatoric\), while the ensemble of high\-accuracy models shows greater disagreement in absolute terms \(epistemic\)\. This has practical implications: collecting more training data and increasing ensemble size would reduce the dominant epistemic component, and the model’s predictions would become more reliable\.

### 5\.3Ablation Studies

#### 5\.3\.1Effect of Calibration Set Size

Figure[2](https://arxiv.org/html/2606.09923#S5.F2)shows the effect of calibration set size on coverage and interval width\. Coverage stabilizes atncal≥50n\_\{\\text\{cal\}\}\\geq 50samples, while interval width decreases monotonically with more calibration data\. This suggests that conformal prediction for neural operators is data\-efficient in the calibration phase\.

![Refer to caption](https://arxiv.org/html/2606.09923v1/x2.png)

Figure 2:Effect of calibration set size on coverage and average interval width\. Coverage stabilizes atncal≥50n\_\{\\text\{cal\}\}\\geq 50\.
#### 5\.3\.2Effect of Significance Levelα\\alpha

Table[5](https://arxiv.org/html/2606.09923#S5.T5)shows the effect of varying the significance levelα\\alphaon coverage and interval width\.

Table 5:Effect of significance levelα\\alphaon normalized conformal prediction \(full\-scale model\)\.Negative gap: slight undercoverage within finite\-sample variance\.

Atα=0\.1\\alpha=0\.1, coverage is close to the 90% target \(89\.1%\), with the gap decreasing for more stringentα\\alphavalues as the conformal quantile becomes more conservative\. The slight undercoverage atα=0\.1\\alpha=0\.1is within the expected finite\-sample variance forncal=200n\_\{\\text\{cal\}\}=200\.

#### 5\.3\.3Normalized vs\. Unnormalized Conformal

Table[6](https://arxiv.org/html/2606.09923#S5.T6)compares normalized and unnormalized conformal prediction on the simple\_chip configuration\.

Table 6:Normalized vs\. unnormalized conformal prediction \(α=0\.1\\alpha=0\.1, simple\_chip, full\-scale\)\.Both methods achieve near\-target coverage \(88\.88% and 89\.11% vs\. 90% target\), with the normalized method slightly outperforming the unnormalized one\. The normalized method produces spatially adaptive intervals \(wider in high\-uncertainty regions, narrower in low\-uncertainty regions\), while the unnormalized method produces constant\-width intervals across the entire domain\. Both methods achieve nearly identical coverage \(∼\\sim89%\), with normalized conformal providing slightly better coverage due to its adaptive nature\.

### 5\.4System Implementation

To demonstrate the practical deployability of our approach, we have implemented the complete UQ pipeline in an open\-source platform111Code available at:[https://github\.com/physai/physai](https://github.com/physai/physai)\. Figure[3](https://arxiv.org/html/2606.09923#S5.F3)shows the layered architecture\.

![Refer to caption](https://arxiv.org/html/2606.09923v1/x3.png)Figure 3:System architecture of the PhysAI platform\. The layered design separates concerns from physics solvers \(bottom\) to REST API and 3D visualization \(top\)\. The Uncertainty Layer implements our conformal prediction method\.REST API Layer\.The platform exposes 15 REST API endpoints \(built with FastAPI\) covering the complete simulation workflow\. Table[7](https://arxiv.org/html/2606.09923#S5.T7)lists the key endpoints\.

Table 7:Key REST API endpoints for the UQ\-enabled simulation platform\.3D Interactive Visualization\.Figure[4](https://arxiv.org/html/2606.09923#S5.F4)shows the web\-based 3D visualization interface \(Three\.js\), which renders temperature fields as 3D surfaces with height and color mapping, overlaid with a semi\-transparent uncertainty layer showing conformal prediction interval widths\. This allows engineers to visually identify regions of high uncertainty and assess the reliability of predictions\.

![Refer to caption](https://arxiv.org/html/2606.09923v1/system_3d_screenshot.png)Figure 4:3D visualization of temperature field prediction with uncertainty overlay\. The surface height and color represent the predicted temperature; the purple semi\-transparent layer shows the conformal prediction interval width \(darker = higher uncertainty\)\.Physics\-Informed Training Framework\.ThePhysicsTrainerimplements the composite loss \([23](https://arxiv.org/html/2606.09923#S5.E23)\) with cosine\-scheduled PDE residual weight, gradient penalty, and early stopping\. TheCompositeLossframework supports modular combination of data fidelity, PDE residual, boundary, conservation, and gradient penalty losses\.

GPU Deployment\.Automated deployment scripts synchronize code to remote GPU servers \(NVIDIA V100\) via SSH, launch training jobs, and fetch results, enabling scalable training without manual intervention\.

## 6Discussion

### 6\.1Implications for Industrial Applications

Our results have direct implications for deploying neural operators in industrial settings:

Safety\-critical design\.In battery thermal management, engineers must ensure that the maximum cell temperature remains below a safety threshold \(e\.g\., 60°C for lithium\-ion cells\)\. Conformal prediction intervals provide a rigorous way to bound the maximum temperature: if the upper bound of the 90% prediction interval is below the threshold, the engineer can be confident that the true temperature will exceed the threshold with probability less than 10%\.

Design optimization under uncertainty\.When using neural operators for parametric design optimization, conformal intervals enable robust optimization: instead of optimizing the mean prediction, one can optimize a conservative estimate \(e\.g\., the 95th percentile\) to account for prediction uncertainty\.

Data collection guidance\.The uncertainty decomposition reveals whether model improvement requires more training data \(high epistemic uncertainty\) or improved data quality \(high aleatoric uncertainty\), enabling targeted data collection campaigns\.

### 6\.2Limitations

We acknowledge several limitations of our current work:

1. 1\.Near\-target coverage\.Our full\-scale UQ experiments achieve 89\.11% coverage against a 90% target, with the∼\\sim1% gap attributable to finite\-sample calibration variance \(ncal=200n\_\{\\text\{cal\}\}=200\)\. Achieving exact nominal coverage may require larger calibration sets or adaptive conformal methods\.
2. 2\.2D steady\-state only\.Our experiments are limited to 2D steady\-state PDEs\. Extending to 3D and transient problems is necessary for real\-world industrial applications\.
3. 3\.Exchangeability assumption\.Conformal prediction requires exchangeability between calibration and test data\. Distribution shift \(e\.g\., new operating conditions not represented in the calibration set\) can invalidate the coverage guarantee\. Adaptive conformal methods\(Gibbs and Candes,[2021](https://arxiv.org/html/2606.09923#bib.bib7)\)could address this\.
4. 4\.Pixel\-wise coverage\.Our method provides pixel\-wise coverage guarantees but does not guarantee simultaneous coverage across all spatial locations\. Simultaneous coverage requires more sophisticated methods such as conformalized simultaneous intervals\.
5. 5\.Computational overhead\.MC Dropout requiresNMCN\_\{\\text\{MC\}\}forward passes per prediction, increasing inference time by a factor ofNMCN\_\{\\text\{MC\}\}\. For real\-time applications, this overhead may be prohibitive\.

### 6\.3Future Work

Several promising directions emerge from this work:

1. 1\.Conformal prediction for 3D neural operators\.Extending our method to FNO3d and AFNO for 3D industrial simulations\.
2. 2\.Temporal conformal prediction\.Developing conformal methods for transient simulations with temporal dependencies\.
3. 3\.Conformalized quantile regression\.Using quantile regression as the base predictor for even more adaptive intervals, followingRomano et al\. \([2019](https://arxiv.org/html/2606.09923#bib.bib28)\)\.
4. 4\.Physics\-informed conformal prediction\.Incorporating physical constraints \(e\.g\., conservation laws, symmetry\) into the conformal prediction framework to ensure physically consistent intervals\.
5. 5\.Active learning with conformal UQ\.Using conformal prediction intervals to guide active data collection, focusing on regions where intervals are widest\.

## 7Conclusion

We have presented the first application of conformal prediction to neural operator\-based physics simulation, providing distribution\-free prediction intervals with rigorous coverage guarantees\. Our normalized conformal prediction scheme leverages MC Dropout uncertainty to produce spatially adaptive intervals while maintaining coverage guarantees\. Full\-scale experiments \(33\.7M parameters, 800 training samples, 5 ensemble members, NVIDIA V100\) on heat conduction benchmarks demonstrate that both naive and normalized conformal prediction achieve∼\\sim89% empirical coverage at the 90% target level, validating the theoretical coverage guarantee within finite\-sample variance\. The uncertainty decomposition reveals that 68% of total uncertainty is epistemic at full scale, providing actionable guidance for model improvement through increased ensemble size and training data\. Our open\-source implementation with REST API and 3D visualization demonstrates the practical deployability of conformal prediction for industrial physics simulation\.

Our work bridges a critical gap between the predictive power of neural operators and the reliability requirements of industrial applications, enabling trustworthy deployment of AI\-based physics simulation in safety\-critical engineering systems\.

## Acknowledgments

We thank the reviewers for their constructive feedback\. This work was supported by internal research funding\.

## References

- Angelopoulos and Bates \[2022\]Anastasios N Angelopoulos and Stephen Bates\.A gentle introduction to conformal prediction and distribution\-free uncertainty quantification\.*arXiv preprint arXiv:2107\.07511*, 2022\.
- Angelopoulos et al\. \[2021\]Anastasios N Angelopoulos, Stephen Bates, Jitendra Malik, and Michael I Jordan\.Uncertainty sets for image classifiers using conformal prediction\.In*International Conference on Learning Representations \(ICLR\)*, 2021\.
- Ashukha and Vetrov \[2020\]Arsenii Ashukha and Dmitry Vetrov\.Pitfalls of in\-domain uncertainty estimation and ensembling in deep learning\.*arXiv preprint arXiv:2002\.06470*, 2020\.
- Brunton et al\. \[2020\]Steven L Brunton, Bernd R Noack, and Petros Koumoutsakos\.Machine learning for fluid mechanics\.*Annual Review of Fluid Mechanics*, 52:477–508, 2020\.
- Fontana et al\. \[2023\]Matteo Fontana, Gianluca Zeni, and Samuel Gandy\.Conformal prediction: A unified review of theory and new challenges\.*Bernoulli*, 29\(1\):1–35, 2023\.
- Gal and Ghahramani \[2016\]Yarin Gal and Zoubin Ghahramani\.Dropout as a bayesian approximation: Representing model uncertainty in deep learning\.In*International Conference on Machine Learning \(ICML\)*, pages 1050–1059, 2016\.
- Gibbs and Candes \[2021\]Isaac Gibbs and Emmanuel Candes\.Adaptive conformal prediction under distribution shift\.*arXiv preprint arXiv:2106\.00170*, 2021\.
- Hüllermeier and Waegeman \[2021\]Eyke Hüllermeier and Willem Waegeman\.Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods\.*Machine Learning*, 110\(3\):457–506, 2021\.
- Karniadakis et al\. \[2021\]George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang\.Physics\-informed machine learning\.*Nature Reviews Physics*, 3\(6\):422–440, 2021\.
- Kasim et al\. \[2022\]Muhammad Kasim et al\.Building high accuracy emulators for scientific simulations with deep neural operators\.*Nature Machine Intelligence*, 4:651–660, 2022\.
- Kendall and Gal \[2017\]Alex Kendall and Yarin Gal\.What uncertainties do we need in bayesian deep learning for computer vision?In*Advances in Neural Information Processing Systems \(NeurIPS\)*, 2017\.
- Kovachki et al\. \[2023\]Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar\.Neural operator: Learning maps between function spaces with applications to pdes\.*Journal of Machine Learning Research*, 24\(89\):1–97, 2023\.
- Lakshminarayanan et al\. \[2017\]Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell\.Simple and scalable predictive uncertainty estimation using deep ensembles\.In*Advances in Neural Information Processing Systems \(NeurIPS\)*, pages 6402–6413, 2017\.
- Lei et al\. \[2018\]Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J Tibshirani, and Larry Wasserman\.Distribution\-free predictive inference for regression\.*Journal of the American Statistical Association*, 113\(523\):1094–1111, 2018\.
- Li et al\. \[2021a\]Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar\.Neural operator: Graph kernel network for partial differential equations\.In*International Conference on Learning Representations \(ICLR\)*, 2021a\.
- Li et al\. \[2021b\]Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar\.Fourier neural operator for parametric partial differential equations\.In*International Conference on Learning Representations \(ICLR\)*, 2021b\.
- Li et al\. \[2022a\]Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar\.Fourier neural operator with learned deformations for parametric partial differential equations\.*arXiv preprint arXiv:2206\.03089*, 2022a\.
- Li et al\. \[2022b\]Zongyi Li et al\.Multiscale transformers: Hierarchical attention for scalable vision and beyond\.*arXiv preprint arXiv:2202\.08088*, 2022b\.
- Li et al\. \[2023\]Zongyi Li et al\.Geometry\-informed neural operator for large\-scale 3d pdes\.*arXiv preprint arXiv:2306\.06022*, 2023\.
- Lu et al\. \[2021\]Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis\.Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators\.*Nature Machine Intelligence*, 3\(3\):218–229, 2021\.
- Mishra and Molinaro \[2022\]Siddharth Mishra and Roberto Molinaro\.Estimating the uncertainty of neural network predictions using conformal prediction\.*arXiv preprint arXiv:2204\.06578*, 2022\.
- Moli et al\. \[2023\]Valentin Moli et al\.Inducing normalizing flows for neural operators\.*arXiv preprint arXiv:2306\.05601*, 2023\.
- NVIDIA \[2024\]NVIDIA\.Nvidia modulus: A framework for building, training, and fine\-tuning deep learning models using physics\-based data\.[https://developer\.nvidia\.com/modulus](https://developer.nvidia.com/modulus), 2024\.
- Papadopoulos et al\. \[2002\]Harris Papadopoulos, Kostas Proedrou, Volodya Vovk, and Alex Gammerman\.Inductive confidence machines for regression\.*Machine Learning: ECML 2002*, pages 345–356, 2002\.
- Pathak et al\. \[2022\]Jaideep Pathak et al\.Fourcastnet: A global data\-driven high\-resolution weather forecasting model\.*arXiv preprint arXiv:2202\.11214*, 2022\.
- Psaros et al\. \[2023\]Apostolos F Psaros, Xuhui Meng, Zongren Zou, Ling Guo, and George Em Karniadakis\.Uncertainty quantification in scientific machine learning: Methods, metrics, and comparisons\.*Journal of Computational Physics*, 474:111802, 2023\.
- Rahaman et al\. \[2021\]Rahul Rahaman et al\.Uncertainty quantification and deep ensembles\.*arXiv preprint arXiv:2007\.08792*, 2021\.
- Romano et al\. \[2019\]Yaniv Romano, Evan Patterson, and Emmanuel Candes\.Conformalized quantile regression\.In*Advances in Neural Information Processing Systems \(NeurIPS\)*, pages 3543–3553, 2019\.
- Romano et al\. \[2020\]Yaniv Romano, Rina Foygel Barber, Chiara Sabatti, and Emmanuel Candès\.With malice toward none: Assessing uncertainty via equalized coverage\.In*Advances in Neural Information Processing Systems \(NeurIPS\)*, 2020\.
- Sesia and Candès \[2020\]Matteo Sesia and Emmanuel J Candès\.A comparison of some conformal quantile regression methods\.*Stat*, 9\(1\):e261, 2020\.
- Shafer and Vovk \[2008\]Glenn Shafer and Vladimir Vovk\.A tutorial on conformal prediction\.*Journal of Machine Learning Research*, 9:371–421, 2008\.
- Stachenfeld et al\. \[2022\]Kimberly Stachenfeld, Jonathan Godwin, Michael Schlichtkrull, and Peter Battaglia\.Learned simulators for turbulence\.*International Conference on Learning Representations \(ICLR\)*, 2022\.
- Sun et al\. \[2022\]Yu Sun et al\.Conformal prediction for molecular property prediction\.*arXiv preprint arXiv:2206\.01256*, 2022\.
- Vovk et al\. \[2005\]Vladimir Vovk, Alexander Gammerman, and Glenn Shafer\.*Algorithmic Learning in a Random World*\.Springer Science & Business Media, 2005\.
- Xu and Xie \[2021\]Chen Xu and Yao Xie\.Conformal prediction interval for dynamic time\-series\.*International Conference on Machine Learning \(ICML\)*, 2021\.

## Appendix AProof of Proposition[3](https://arxiv.org/html/2606.09923#Thmtheorem3)

###### Full proof\.

Let\(X1,Y1\),…,\(Xn,Yn\),\(Xn\+1,Yn\+1\)\(X\_\{1\},Y\_\{1\}\),\\ldots,\(X\_\{n\},Y\_\{n\}\),\(X\_\{n\+1\},Y\_\{n\+1\}\)be exchangeable random variables\. Define the normalized nonconformity scores:

S~i=\|Yi−f\(Xi\)\|σ\(Xi\)\+ϵ,i=1,…,n,\\tilde\{S\}\_\{i\}=\\frac\{\|Y\_\{i\}\-f\(X\_\{i\}\)\|\}\{\\sigma\(X\_\{i\}\)\+\\epsilon\},\\quad i=1,\\ldots,n,\(24\)whereσ\(x\)\\sigma\(x\)is a deterministic function ofxx\(the MC Dropout standard deviation\)\.

Sinceσ\(x\)\\sigma\(x\)is a deterministic function, the normalized scoresS~1,…,S~n,S~n\+1\\tilde\{S\}\_\{1\},\\ldots,\\tilde\{S\}\_\{n\},\\tilde\{S\}\_\{n\+1\}are also exchangeable \(as they are deterministic functions of exchangeable random variables\)\.

By the same argument as Theorem[1](https://arxiv.org/html/2606.09923#Thmtheorem1)\(seeVovk et al\. \[[2005](https://arxiv.org/html/2606.09923#bib.bib34)\], Theorem 8\.1\), the conformal quantileq^norm\\hat\{q\}\_\{\\text\{norm\}\}satisfies:

ℙ\(S~n\+1≤q^norm\)≥1−α\.\\mathbb\{P\}\\left\(\\tilde\{S\}\_\{n\+1\}\\leq\\hat\{q\}\_\{\\text\{norm\}\}\\right\)\\geq 1\-\\alpha\.\(25\)
This is equivalent to:

ℙ\(\|Yn\+1−f\(Xn\+1\)\|σ\(Xn\+1\)\+ϵ≤q^norm\)≥1−α,\\mathbb\{P\}\\left\(\\frac\{\|Y\_\{n\+1\}\-f\(X\_\{n\+1\}\)\|\}\{\\sigma\(X\_\{n\+1\}\)\+\\epsilon\}\\leq\\hat\{q\}\_\{\\text\{norm\}\}\\right\)\\geq 1\-\\alpha,\(26\)which gives:

ℙ\(Yn\+1∈\[f\(Xn\+1\)−q^norm⋅σ\(Xn\+1\),f\(Xn\+1\)\+q^norm⋅σ\(Xn\+1\)\]\)≥1−α\.\\mathbb\{P\}\\Big\(Y\_\{n\+1\}\\in\\bigl\[f\(X\_\{n\+1\}\)\-\\hat\{q\}\_\{\\text\{norm\}\}\\cdot\\sigma\(X\_\{n\+1\}\),\\\\ f\(X\_\{n\+1\}\)\+\\hat\{q\}\_\{\\text\{norm\}\}\\cdot\\sigma\(X\_\{n\+1\}\)\\bigr\]\\Big\)\\geq 1\-\\alpha\.\(27\)∎

## Appendix BAdditional Experimental Details

### B\.1Hyperparameter Sensitivity

Table[8](https://arxiv.org/html/2606.09923#A2.T8)shows the effect of dropout rate on MC Dropout UQ quality\.

Table 8:Dropout rate effect on UQ \(α=0\.1\\alpha\\\!=\\\!0\.1\)\.Higher dropout rates increase MC Dropout uncertainty \(improving coverage\) but degrade prediction accuracy\. A dropout rate ofp=0\.05p=0\.05provides a good balance between accuracy and UQ quality\.

### B\.2Ensemble Size Analysis

Table[9](https://arxiv.org/html/2606.09923#A2.T9)shows the effect of ensemble size on Deep Ensemble UQ\.

Table 9:Ensemble size effect \(α=0\.1\\alpha\\\!=\\\!0\.1\)\.Ensemble coverage improves with more members but remains below the 90% target even with 7 members, highlighting the need for conformal calibration\.

### B\.3PDE Residual Loss Ablation

Table[10](https://arxiv.org/html/2606.09923#A2.T10)compares training with and without PDE residual loss\.

Table 10:PDE residual loss ablation \(α=0\.1\\alpha\\\!=\\\!0\.1\)\.The cosine\-scheduled PDE residual loss improves both accuracy and conformal coverage while reducing interval width, demonstrating the value of physics\-informed training for UQ\.
Conformal Prediction for Neural Operators: Distribution-Free Uncertainty Quantification in Physics Simulation

Similar Articles

Structure-Preserving Neural Surrogates with Tractable Uncertainty Quantification

Sequential Physics-Constrained Neural Operator Forward Modeling for the $\textit{Norne}$ Reservoir System

Scalable Uncertainty Quantification for Extreme Weather Forecasting via Empirical Neural Tangent Kernels

A PAC-Bayesian View of Generalisation for Physics-Informed Machine Learning

Online Localized Conformal Prediction

Submit Feedback

Similar Articles

Structure-Preserving Neural Surrogates with Tractable Uncertainty Quantification
Sequential Physics-Constrained Neural Operator Forward Modeling for the $\textit{Norne}$ Reservoir System
Scalable Uncertainty Quantification for Extreme Weather Forecasting via Empirical Neural Tangent Kernels
A PAC-Bayesian View of Generalisation for Physics-Informed Machine Learning
Online Localized Conformal Prediction