A Trainable-by-Parts Operator Learning Framework: Bridging DeepONet and Karhunen-Loeve Expansions for Large-Scale Applications

arXiv cs.LG Papers

Summary

Proposes KL-DNN, a scalable operator learning framework that uses Karhunen-Loève expansions to handle large-scale PDE problems, achieving lower errors and two-order-of-magnitude speedup over DeepONet on a 3D carbon storage problem.

arXiv:2606.28519v1 Announce Type: new Abstract: Training operator-learning models for large-scale problems governed by partial differential equations (PDEs) is challenging due to the curse of dimensionality, memory constraints, and limited training data. These challenges arise in many scientific and engineering applications, including subsurface flow, climate modeling, and geological carbon storage (GCS). In this work, we propose a scalable operator-learning framework based on the Karhunen-Loeve Deep Neural Network (KL-DNN) and demonstrate its performance for modeling GCS. The model is trained on a dataset comprising 100 samples of large-scale simulations in a three-dimensional domain with 1.7 million cells and 50 time steps. The KL-DNN method constructs latent spaces using low-rank singular value decomposition of static properties and a nested Karhunen-Loeve expansion for dynamic pressure fields, enabling full-resolution predictions without subsampling or spatial coarsening. The KL-DNN model achieves an average root mean square error (RMSE) of 1.1 psi for pressure (0.04% relative error with respect to the average pressure in the domain) and RMSE of 0.0146 for CO2 saturation (5% relative error with respect to the average saturation inside the plume). The model requires 20 minutes of training on a single GPU, representing a 19% reduction in the pressure errors, 7% reduction in the saturation error, and a two-order-of-magnitude speedup compared to DeepONet trained on the same dataset. These results, along with inference time of less than one minute, establish the proposed model as a practical and accurate solution for large-scale PDE problems, enabling rapid uncertainty quantification, history matching, and real-time decision support.
Original Article
View Cached Full Text

Cached at: 06/30/26, 05:27 AM

# A Trainable-by-Parts Operator Learning Framework: Bridging DeepONet and Karhunen–Loève Expansions for Large-Scale Applications
Source: [https://arxiv.org/html/2606.28519](https://arxiv.org/html/2606.28519)
KL\-DNNKarhunen\-Loève Deep Neural NetworkGCSGeological Carbon StorageIBDPIllinois Basin Decatur ProjectKLKarhunen\-LoèveKLEKarhunen\-Loève ExpansionPDEPartial Differential EquationsSVDSingular Value DecompositionDNNDeep Neural NetworkMSEMean Square ErrorRMSERoot Mean Square ErrorMAEMean Absolute ErrorISGSIllinois State Geological Survey
Christian Munoz Civil Engineering Department University of Illinois Urbana\-Champaign Urbana, IL 61801 camunoz4@illinois\.edu &Alexandre Tartakovsky Civil Engineering Department University of Illinois Urbana\-Champaign Urbana, IL 61801 Pacific Northwest National Laboratory, Richland, WA 99352 amt1998@illinois\.edu

###### Abstract

Training operator\-learning models for large\-scale problems governed by partial differential equations \(PDEs\) is challenging due to the curse of dimensionality, memory constraints, and limited training data\. These challenges arise in many scientific and engineering applications, including subsurface flow, climate modeling, and geological carbon storage \(GCS\)\. In this work, we propose a scalable operator\-learning framework based on the Karhunen–Loève Deep Neural Network \(KL\-DNN\) and demonstrate its performance for modeling GCS\. The model is trained on a dataset comprising 100 samples of large\-scale simulations in a three\-dimensional domain with 1\.7 million cells and 50 time steps\. The KL\-DNN method constructs latent spaces using low\-rank singular value decomposition of static properties and a nested Karhunen–Loève expansion for dynamic pressure fields, enabling full\-resolution predictions without subsampling or spatial coarsening\. The KL\-DNN model achieves an average[Root Mean Square Error](https://arxiv.org/html/2606.28519#id10.10.id10)\([RMSE](https://arxiv.org/html/2606.28519#id10.10.id10)\) of 1\.1 psi for pressure \(≈0\.04\\approx 0\.04% relative error with respect to the average pressure in the domain\) and RMSE of 0\.0146 for CO2saturation \(≈\\approx5% relative error with respect to the average saturation inside the plume\)\. The model requires≈20\\approx 20minutes of training on a single GPU, representing a 19% reduction in the pressure errors, 7% reduction in the saturation error, and a two\-order\-of\-magnitude speedup compared to DeepONet trained on the same dataset\. These results, along with inference time of less than one minute, establish the proposed model as a practical and accurate solution for large\-scale PDE problems, enabling rapid uncertainty quantification, history matching, and real\-time decision support\.

## Plain Language Summary

We propose a faster, more scalable ML model for complex physical systems, such as underground carbon storage\. Traditional physics\-based simulations for these systems are accurate but slow and computationally expensive, making them impractical when many repeated simulations are needed \(for example, to assess uncertainty or test different scenarios\)\. We introduce a framework called a Karhunen\-Loève Deep Neural Network \(KL\-DNN\)\. The key idea is to compress very large, high\-dimensional data \(such as 3D pressure and CO2saturation fields evolving over time\) into a much smaller set of variables that still capture the essential patterns\. The model is trained in parts, allowing the most computationally demanding steps to be handled separately and efficiently\. The method is tested on a realistic, large\-scale carbon storage problem with millions of grid cells\. Despite using only 100 training simulations, the model accurately predicts pressure and CO2distribution across space and time\. It achieves lower prediction errors than an established approach \(DeepONet\) while reducing training time from days to about 20 minutes and producing results in under a minute\. Overall, this framework enables rapid, reliable predictions for large\-scale geophysical systems, supporting applications such as uncertainty quantification, model calibration, and real\-time decision\-making\.

## 1 Introduction

Physics\-based simulations of large\-scale problems, while accurate, are computationally expensive\. The main reason is that the computational cost of numerical solutions to partial differential equations \(PDEs\) increases exponentially with the number of cells in the domain discretization, a phenomenon known as the curse of dimensionality\.

In this work, we focus on problems that require millions of nodes and tens of time intervals to adequately represent the system’s properties and states\. Geological Carbon Storage \(GCS\) is one example of such problems\. In[Geological Carbon Storage](https://arxiv.org/html/2606.28519#id2.2.id2)\([GCS](https://arxiv.org/html/2606.28519#id2.2.id2)\) applications, the system needs a fine discretization in critical areas, such as injection zones and geological faults, which often results in a large number of nodes in numerical models\. High heterogeneity of subsurface parameters \(such as porosity and permeability\) and the inherent uncertainty in their distribution make subsurface modeling especially challenging\[[18](https://arxiv.org/html/2606.28519#bib.bib24),[25](https://arxiv.org/html/2606.28519#bib.bib23),[9](https://arxiv.org/html/2606.28519#bib.bib22)\]\. The practical application of subsurface models requires multiple model evaluations across different parameter combinations\. For example, numerous model runs are needed to estimate uncertainty in model predictions, investigate different scenarios, and perform history matching\.

The need to perform multiple simulations and their high computational cost motivate the development of operator\-learning models that can predict system states \(e\.g\., CO2pressure and saturation in GCS\) as functions of system parameters \(e\.g\., reservoir porosity and permeability\)\. Several operator learning models have been proposed for GCS, including Fourier Neural Operator\[[19](https://arxiv.org/html/2606.28519#bib.bib41)\], convolutional encoder\-decoder networks\[[12](https://arxiv.org/html/2606.28519#bib.bib40)\], and DeepONet\[[10](https://arxiv.org/html/2606.28519#bib.bib32)\]\. Among these approaches, only the DeepONet model in\[[10](https://arxiv.org/html/2606.28519#bib.bib32)\]addresses problems at the scale considered in this work\. Specifically, in\[[10](https://arxiv.org/html/2606.28519#bib.bib32)\], the DeepONet model was trained for the Illinois Basin Decatur Project \([Illinois Basin Decatur Project](https://arxiv.org/html/2606.28519#id3.3.id3)\([IBDP](https://arxiv.org/html/2606.28519#id3.3.id3)\)\) CO2injection site using a training dataset consisting of numerical simulations performed on a three\-dimensional domain discretized into 1\.7 million cells, with solutions reported at 50 time steps\.

DeepONet performs operator learning in a reduced \(latent\) parameter space, determined by the branch networks, and in a reduced state\-variable space, determined by the trunk network\. Other related latent\-space operator learning models for parametric PDEs include PCA\-Net\[[3](https://arxiv.org/html/2606.28519#bib.bib8)\]and KL\-DNN\[[24](https://arxiv.org/html/2606.28519#bib.bib35),[23](https://arxiv.org/html/2606.28519#bib.bib25)\]\. These two methods use closely related PCA and[Karhunen\-Loève](https://arxiv.org/html/2606.28519#id4.4.id4)\([KL](https://arxiv.org/html/2606.28519#id4.4.id4)\) expansions\[[8](https://arxiv.org/html/2606.28519#bib.bib68)\]to find latent spaces\. In DeepONet, PCA\-Net, and KL\-DNN, operator learning is performed in a latent space using a fully connected feed\-forward deep neural network \(DNN\), which requires only a few fully connected layers\. However, learning a latent representation for large\-scale problems can require prohibitively large computer memory and training time\.

Training DeepONet for large\-scale problems is particularly challenging because the latent representations of states and parameters are jointly learned and combined with DNN training\. In PCA\-Net and KL\-DNN, the latent representations are learned separately for states and parameters via singular value decomposition \(SVD\)\. However, performing SVD on large datasets may require substantial memory that may not be available even on large supercomputers\.

DeepONet has been applied to large\-scale problems using several strategies\. DeepONet with domain decomposition \(DD\-DeepONet\) splits large domains into subdomains, reducing memory requirements and training time\[[27](https://arxiv.org/html/2606.28519#bib.bib17)\]\. Time\-integration DeepONet variants \(TI\-DeepONet\) focus on derivative operators to avoid long sequential rollouts\[[13](https://arxiv.org/html/2606.28519#bib.bib16)\]\. Ensemble and mixture\-of\-experts architectures further localize learning for heterogeneous domains\[[17](https://arxiv.org/html/2606.28519#bib.bib15)\]\. Separable DeepONet \(SepONet\) factorizes spatial dimensions to lower complexity and memory footprint\[[28](https://arxiv.org/html/2606.28519#bib.bib14)\]\. In\[[10](https://arxiv.org/html/2606.28519#bib.bib32)\], a graph\-based topology embedding was introduced in DeepONet to capture local and global interactions\. A dynamic resampling of training subsets was used to manage the memory required for DeepONet training and the associated computational costs when handling domains with millions of cells and multiple time steps\.

In PCA\-Net, a localized PCA on spatial patches was used to reduce computational cost and memory requirements\[[6](https://arxiv.org/html/2606.28519#bib.bib13)\]\. This patch\-based approach achieved up to four times faster training than global PCA models and improved accuracy near critical regions by preserving local variability\.

In this study, we propose a general trainable\-by\-parts operator learning framework for large\-scale PDE problems\. As the basis of this framework, we adapt the KL\-DNN/PCA\-Net method\. In large\-scale applications, the KL\-DNN and PCA\-Net methods are nearly identical, as both use SVD to construct PCA or KL representations of parameter and state fields\. Furthermore, we demonstrate that the resulting model is a special case of DeepONet and can be considered a generalized operator\-learning model\.

The main innovations of this work are the following:

- •We propose a two\-step approach for performing SVD decomposition on large\-scale problems\. In the first step, the multi\-dimensional KLE is decomposed into the lower\-dimensional KLEs\. In the second step, we estimate eigenvalues and eigenfunctions in the low\-dimensional KLEs using a low\-rank SVD\. This approach enables the SVD decomposition of large datasets without significantly compromising accuracy, a task that is often infeasible even on supercomputers due to memory limitations\.
- •We establish a correspondence between the PCA\-Net/KL\-DNN and DeepONet models\. This allows for separately training the trunk and branch of DeepONet and extending it to large\-scale problems\.

We demonstrate the effectiveness of the proposed model by comparing its accuracy and training time to those of the DeepONet model trained on the IBDP dataset, which consists of 100 samples, each sample computed on a 1\.7 million\-node\-mesh and 50 time steps\[[10](https://arxiv.org/html/2606.28519#bib.bib32)\]\. We demonstrate that our model requires 2 orders of magnitude less training time and is up to 19% more accurate than the DeepONet trained on this large dataset using dynamic resampling\.

This work is organized as follows: Section[2](https://arxiv.org/html/2606.28519#S2)presents the scalable operator learning model\. Section[3](https://arxiv.org/html/2606.28519#S3)describes training and the predictions of the operator learning model using the[IBDP](https://arxiv.org/html/2606.28519#id3.3.id3)dataset\. Conclusions are given in Section[4](https://arxiv.org/html/2606.28519#S4)\.

## 2 Operator learning model

We present the operator learning method for the general[Partial Differential Equations](https://arxiv.org/html/2606.28519#id6.6.id6)\([PDE](https://arxiv.org/html/2606.28519#id6.6.id6)\) problem,

L​\[u​\(𝒙,t\),y1​\(𝒙\),…,yNp​\(𝒙\)\]=0,L\[u\(\\bm\{x\},t\),y\_\{1\}\(\\bm\{x\}\),\.\.\.,y\_\{N\_\{p\}\}\(\\bm\{x\}\)\]=0,\(1\)subject to appropriate initial and boundary conditions\. Here,LLis the differential operator,u​\(𝒙,t\)u\(\\bm\{x\},t\)is the state variable, andy1​\(𝒙\),…,yNp​\(𝒙\)y\_\{1\}\(\\bm\{x\}\),\.\.\.,y\_\{N\_\{p\}\}\(\\bm\{x\}\)are theNpN\_\{p\}space\-dependent parameter fields\.

We assume that a labeled datasetD=\{𝒚1\(i\),…,𝒚Np\(i\)→𝒖\(i\)\}i=1Nt​r​a​i​nD=\\\{\\bm\{y\}\_\{1\}^\{\(i\)\},\.\.\.,\\bm\{y\}^\{\(i\)\}\_\{N\_\{p\}\}\\rightarrow\\bm\{u\}^\{\(i\)\}\\\}\_\{i=1\}^\{N\_\{train\}\}exists consisting ofNt​r​a​i​nN\_\{train\}samples of the numerical PDE solution obtained on aNx×Ny×Nz×NtN\_\{x\}\\times N\_\{y\}\\times N\_\{z\}\\times N\_\{t\}space\-time mesh\.

Our goal is to develop an operator learning model𝒢u​\[𝒚1,…,𝒚Np\]​\(𝒙,t\)\\mathcal\{G\}\_\{u\}\[\\bm\{y\}\_\{1\},\.\.\.,\\bm\{y\}\_\{N\_\{p\}\}\]\\left\(\\bm\{x\},t\\right\)for large\-scale problems whereNxN\_\{x\},NyN\_\{y\},NzN\_\{z\}, andNtN\_\{t\}may be on the order of millions\. Below, we discuss the challenges of training operator learning models for large\-scale problems and how the proposed model addresses them\.

To motivate the proposed approach, we first review the KL\-DNN formulation of a neural operator𝒢u​\[y​\(𝒙\)\]​\(𝒙,t\)\\mathcal\{G\}\_\{u\}\[y\(\\bm\{x\}\)\]\(\\bm\{x\},t\)forNp=1N\_\{p\}=1\(a single parameter field\)\. In this method,uuandyyare treated as realizations of random processesUUandYY, and𝒚\(i\)\\bm\{y\}^\{\(i\)\}and𝒖\(i\)\\bm\{u\}^\{\(i\)\}in the training datasetDDare generated as independent samples ofYYandUU, respectively\. The operator𝒢u​\[y\]​\(𝒙,t\)\\mathcal\{G\}\_\{u\}\[y\]\(\\bm\{x\},t\)is given by the space\-time KLE ofUU\[[21](https://arxiv.org/html/2606.28519#bib.bib76)\]:

𝒢u​\[y\]​\(𝒙,t\)=u¯​\(𝒙,t\)\+∑i=1Nηηi​\[𝝃​\(y\);𝜽\]​ϕi​\(𝒙,t\)​λi,\\mathcal\{G\}\_\{u\}\[y\]\(\\bm\{x\},t\)=\\bar\{u\}\(\\bm\{x\},t\)\+\\sum^\{N\_\{\\eta\}\}\_\{i=1\}\\eta\_\{i\}\[\\bm\{\\xi\}\(y\);\\bm\{\\theta\}\]\\phi\_\{i\}\(\\bm\{x\},t\)\\sqrt\{\\lambda\_\{i\}\},\(2\)whereu¯​\(𝒙,t\)\\bar\{u\}\(\\bm\{x\},t\)is the ensemble mean ofUU,ϕi​\(𝒙,t\)\\phi\_\{i\}\(\\bm\{x\},t\),λi\{\\lambda\_\{i\}\}are theNηN\_\{\\eta\}leading eigenfunctions and eigenvalues of the covariance ofUU, andηi\\eta\_\{i\}are the KLE coefficients\. The eigenfunctions and eigenvalues are found from the eigenvalue decomposition of the sample covariance matrix ofUUor the SVD of𝒖\(i\)\\bm\{u\}^\{\(i\)\}samples\. Therefore, the maximum number of eigenpairs that can be computed isNt​r​a​i​nN\_\{train\}, i\.e\.,Nη≤Nt​r​a​i​nN\_\{\\eta\}\\leq N\_\{train\}\. For a sufficiently largeNt​r​a​i​nN\_\{train\},NηN\_\{\\eta\}can be determined from the Mercer theorem\[[11](https://arxiv.org/html/2606.28519#bib.bib10)\]according to the desired tolerancer​t​o​lrtolas

r​t​o​l=∑i=Nη\+1∞λi/∑i=1∞λi\.rtol=\{\\sum\_\{i=N\_\{\\eta\}\+1\}^\{\\infty\}\\lambda\_\{i\}\}/\{\\sum\_\{i=1\}^\{\\infty\}\\lambda\_\{i\}\}\.\(3\)However, for large\-scale problems,Nt​r​a​i​nN\_\{train\}is relatively low due to the high computational cost of generating samples\. Therefore,Nt​r​a​i​nN\_\{train\}is usually a limiting factor in selectingNηN\_\{\\eta\}\. Specific constraints for selectingNηN\_\{\\eta\}will be discussed later in this section\.

The KLE coefficients𝜼=\[η1,…,ηN\]T\\bm\{\\eta\}=\[\\eta\_\{1\},\.\.\.,\\eta\_\{N\}\]^\{T\}are mapped to the vector of coefficients𝝃=\(ξ1,…,ξNξ\)T\\bm\{\\xi\}=\(\\xi\_\{1\},\.\.\.,\\xi\_\{N\_\{\\xi\}\}\)^\{T\}in the truncated KLE of the parameter fieldyy,

y​\(𝒙\)≈𝒦y​\(𝒙,𝝃\)=y¯​\(𝒙\)\+∑i=1Nξξi​χi​\(𝒙\)​βi,y\(\\bm\{x\}\)\\approx\\mathcal\{K\}\_\{y\}\\left\(\\bm\{x\},\\bm\{\\xi\}\\right\)=\\bar\{y\}\(\\bm\{x\}\)\+\\sum^\{N\_\{\\xi\}\}\_\{i=1\}\\xi\_\{i\}\\chi\_\{i\}\(\\bm\{x\}\)\\sqrt\{\\beta\_\{i\}\},\(4\)using the fully connected feed\-forward DNN𝜼=𝒩​𝒩​\(𝝃;𝜽\)\\bm\{\\eta\}=\\mathcal\{NN\}\(\\bm\{\\xi\};\\bm\{\\theta\}\)with the output vector𝜼\\bm\{\\eta\}and the parameters𝜽\\bm\{\\theta\}\. In the[Karhunen\-Loève Expansion](https://arxiv.org/html/2606.28519#id5.5.id5)\([KLE](https://arxiv.org/html/2606.28519#id5.5.id5)\) ofyy,y¯​\(𝒙\)\\bar\{y\}\(\\bm\{x\}\)is the mean ofYY, andχi​\(𝒙\)\\chi\_\{i\}\(\\bm\{x\}\), andβi\{\\beta\_\{i\}\}are the eigenfunctions and eigenvalues ofCy​\(𝒙,𝒙′\)C\_\{y\}\(\\bm\{x\},\\bm\{x\}^\{\\prime\}\), the \(known\) covariance function ofYY\. The KL\-DNN model is shown in Figure[1](https://arxiv.org/html/2606.28519#S2.F1)\.

Similar to other neural operator models, including PCA\-Net and DeepONet, KL\-DNN can be trained using the full datasetDDby solving the minimization problem

𝜽∗=min𝜽​∑i=1Ntrain‖u\(i\)​\(𝒙,t\)−𝒢u​\[𝒚\(i\)\]​\(𝒙,t;𝜽\)‖22\+1σ2​‖𝜽‖22,\\bm\{\\theta\}^\{\*\}=\\min\_\{\\bm\{\\theta\}\}\\sum\_\{i=1\}^\{N\_\{\\text\{train\}\}\}\|\|u^\{\(i\)\}\(\\bm\{x\},t\)\-\\mathcal\{G\}\_\{u\}\[\\bm\{y\}^\{\(i\)\}\]\(\\bm\{x\},t;\\bm\{\\theta\}\)\|\|^\{2\}\_\{2\}\+\\frac\{1\}\{\\sigma^\{2\}\}\|\|\\bm\{\\theta\}\|\|\_\{2\}^\{2\},\(5\)where theℓ2\\ell\_\{2\}norm of any functionc​\(x,t\)c\(x,t\)is defined as‖c​\(x,t\)‖2=∫0T∫Ω\(c​\(x,t\)\)2​𝑑x​𝑑t\|\|c\(x,t\)\|\|\_\{2\}=\\sqrt\{\\int\_\{0\}^\{T\}\\int\_\{\\Omega\}\(c\(x,t\)\)^\{2\}dxdt\}and1/σ21/\\sigma^\{2\}is a coefficient in the regularization term\. The advantage of having pre\-computed KLEs with the orthogonal eigenfunctions is that this expensive\-to\-compute function norm can be replaced with a much simplerℓ2\\ell\_\{2\}vector norm as

\|\|u\(i\)\(𝒙,t\)−𝒢u\[𝒚\(i\)\]\(𝒙,t;𝜽\)\|\|22=\[𝜼\(i\)−𝜼^\(𝝃\(i\);𝜽\)\]T𝚲\[𝜼\(i\)−𝜼^\(𝝃\(i\);𝜽\)\]=\|\|\[𝜼\(i\)−𝜼^\(𝝃\(i\);𝜽\)\|\|𝚲2,\|\|u^\{\(i\)\}\(\\bm\{x\},t\)\-\\mathcal\{G\}\_\{u\}\[\\bm\{y\}^\{\(i\)\}\]\(\\bm\{x\},t;\\bm\{\\theta\}\)\|\|^\{2\}\_\{2\}=\[\\bm\{\\eta\}^\{\(i\)\}\-\\hat\{\\bm\{\\eta\}\}\(\\bm\{\\xi\}^\{\(i\)\};\\bm\{\\theta\}\)\]^\{T\}\\bm\{\\Lambda\}\[\\bm\{\\eta\}^\{\(i\)\}\-\\hat\{\\bm\{\\eta\}\}\(\\bm\{\\xi\}^\{\(i\)\};\\bm\{\\theta\}\)\]=\|\|\[\\bm\{\\eta\}^\{\(i\)\}\-\\hat\{\\bm\{\\eta\}\}\(\\bm\{\\xi\}^\{\(i\)\};\\bm\{\\theta\}\)\|\|^\{2\}\_\{\\bm\{\\Lambda\}\},\(6\)where𝚲\\bm\{\\Lambda\}is the diagonal matrix with the diagonal elementsΛi​i=λi\\Lambda\_\{ii\}=\\lambda\_\{i\}and𝜼\(i\)\\bm\{\\eta\}^\{\(i\)\}is the vector of coefficients in the KLE of𝒖\(i\)\\bm\{u\}^\{\(i\)\}\[[15](https://arxiv.org/html/2606.28519#bib.bib12)\]\. It follows from Eq \([6](https://arxiv.org/html/2606.28519#S2.E6)\) that the parameters𝜽\\bm\{\\theta\}in the KL\-DNN model can be found using the reduced \(latent\)\-space datasetDl=\{𝝃\(i\)→𝜼\(i\)\}i=1NtrainD\_\{l\}=\\\{\\bm\{\\xi\}^\{\(i\)\}\\rightarrow\\bm\{\\eta\}^\{\(i\)\}\\\}\_\{i=1\}^\{N\_\{\\text\{train\}\}\}by solving the minimization problem

L​\(𝜽\)=∑i=1Ntrain‖𝒩​𝒩​\(𝝃\(i\);𝜽\)−𝜼\(i\)‖𝚲2\+1σ2​‖𝜽‖22\.\\displaystyle L\(\\bm\{\\theta\}\)=\\sum\_\{i=1\}^\{N\_\{\\text\{train\}\}\}\|\|\\mathcal\{NN\}\(\\bm\{\\xi\}^\{\(i\)\};\\bm\{\\theta\}\)\-\\bm\{\\eta\}^\{\(i\)\}\|\|^\{2\}\_\{\\bm\{\\Lambda\}\}\+\\frac\{1\}\{\\sigma^\{2\}\}\|\|\\bm\{\\theta\}\|\|\_\{2\}^\{2\}\.\(7\)The reduced dataset is obtained by applying the inverse KL operator𝒦−1\\mathcal\{K\}^\{\-1\}to the elements of theDDdataset:𝝃\(i\)=𝒦−1​\[𝒚\(i\)\]\\bm\{\\xi\}^\{\(i\)\}=\\mathcal\{K\}^\{\-1\}\[\\bm\{y\}^\{\(i\)\}\]and𝜼\(i\)=𝒦−1​\[𝒖\(i\)\]\\bm\{\\eta\}^\{\(i\)\}=\\mathcal\{K\}^\{\-1\}\[\\bm\{u\}^\{\(i\)\}\]\. For example, for the KLE ofuu,

u​\(𝒙,t\)≈𝒦u​\(𝒙,t,𝜼\)=u¯​\(𝒙,t\)\+∑i=1Nηηi​ϕi​\(𝒙,t\)​λi,u\(\\bm\{x\},t\)\\approx\\mathcal\{K\}\_\{u\}\\left\(\\bm\{x\},t,\\bm\{\\eta\}\\right\)=\\bar\{u\}\(\\bm\{x\},t\)\+\\sum^\{N\_\{\\eta\}\}\_\{i=1\}\\eta\_\{i\}\\phi\_\{i\}\(\\bm\{x\},t\)\\sqrt\{\\lambda\_\{i\}\},\(8\)the inverse operator acting on𝒖\(i\)\\bm\{u\}^\{\(i\)\}is defined as

𝒦−1​\[𝒖\(i\)\]≔arg⁡min𝜼⁡\(‖𝒖​\(𝒙,t\)\(i\)−𝒦u​\[𝒙,t,𝜼\]‖22\+γ​‖𝜼‖22\),\\displaystyle\\mathcal\{K\}^\{\-1\}\\left\[\\bm\{u\}^\{\(i\)\}\\right\]\\coloneq\\arg\\min\_\{\\bm\{\\eta\}\}\\left\(\|\|\\bm\{u\}\(\\bm\{x\},t\)^\{\(i\)\}\-\\mathcal\{K\}\_\{u\}\\left\[\\bm\{x\},t,\\bm\{\\eta\}\\right\]\|\|\_\{2\}^\{2\}\+\\gamma\|\|\\bm\{\\eta\}\|\|\_\{2\}^\{2\}\\right\),\(9\)whereγ\\gammais a regularization coefficient\. Using the loss function in Eq \([7](https://arxiv.org/html/2606.28519#S2.E7)\) instead of that in Eq \([5](https://arxiv.org/html/2606.28519#S2.E5)\) significantly reduces the training time and the memory requirements\.

The mean and covariance ofUUcan be computed by replacingyyanduuin the PDE problem \([1](https://arxiv.org/html/2606.28519#S2.E1)\) with their stochastic counterparts, and solving the resulting stochastic PDE foru¯\\overline\{u\}andCuC\_\{u\}using methods such as Polynomial Chaos\[[26](https://arxiv.org/html/2606.28519#bib.bib5)\], Moment Equations\[[22](https://arxiv.org/html/2606.28519#bib.bib6)\], or Monte Carlo\[[20](https://arxiv.org/html/2606.28519#bib.bib4)\]\. Most commonly, the Monte Carlo method is employed where the datasetDDis used to approximateu¯\\overline\{u\}andCuC\_\{u\}as the sample mean vector𝒖¯∈ℝNx​y​z​t\\bar\{\\bm\{u\}\}\\in\\mathbb\{R\}^\{N\_\{xyzt\}\},

𝒖¯=1Ntrain​∑i=1Ntrain𝒖\(i\),\\bar\{\\bm\{u\}\}=\\frac\{1\}\{N\_\{\\text\{train\}\}\}\\sum\_\{i=1\}^\{N\_\{\\text\{train\}\}\}\\bm\{u\}^\{\(i\)\},\(10\)and the sample covariance matrix𝐂u∈ℝNx​y​z​t×Nx​y​z​t\\mathbf\{C\}\_\{u\}\\in\\mathbb\{R\}^\{N\_\{xyzt\}\\times N\_\{xyzt\}\},

𝐂u=1Nt​r​a​i​n−1​𝐗T​𝐗,\\mathbf\{C\}\_\{u\}=\\frac\{1\}\{N\_\{train\}\-1\}\\mathbf\{X\}^\{T\}\\mathbf\{X\},\(11\)whereNx​y​z​t=Nx×Ny×Nz×NtN\_\{xyzt\}=N\_\{x\}\\times N\_\{y\}\\times N\_\{z\}\\times N\_\{t\}and the centered data matrix𝐗∈ℝNt​r​a​i​n×Nx​y​z​t\\mathbf\{X\}\\in\\mathbb\{R\}^\{N\_\{train\}\\times N\_\{xyzt\}\}consists of elements\{Xi​j=uj\(i\)−u¯j\}i=1,j=1Nt​r​a​i​n,Nx​y​z​t\\\{X\_\{ij\}=u^\{\(i\)\}\_\{j\}\-\\overline\{u\}\_\{j\}\\\}\_\{i=1,j=1\}^\{N\_\{train\},N\_\{xyzt\}\}\.

The eigenvaluesλi\{\\lambda\_\{i\}\}and eigenfunctionsϕi​\(𝒙,t\)\\phi\_\{i\}\(\\bm\{x\},t\)can be found from the eigenvalue problem

∫D∫TCu​\(𝒙,𝒙′,t,t′\)​ϕ​\(𝒙′,t′\)​d𝒙′​dt′=λ​ϕ​\(𝒙,t\)\.\\int\_\{D\}\\int\_\{T\}C\_\{u\}\(\\bm\{x\},\\bm\{x\}^\{\\prime\},t,t^\{\\prime\}\)\\phi\(\\bm\{x\}^\{\\prime\},t^\{\\prime\}\)\\,\\mathrm\{d\}\\bm\{x\}^\{\\prime\}\\mathrm\{d\}t^\{\\prime\}=\\lambda\\phi\(\\bm\{x\},t\)\.\(12\)The numerical solution of Eq \([12](https://arxiv.org/html/2606.28519#S2.E12)\) can be obtained by discretizing it as

\(𝑾1/2×𝑪u×𝑾1/2\)×\(𝑾1/2×ϕ\)=λ​\(𝑾1/2×ϕ\),\(\\bm\{W\}^\{1/2\}\\times\\bm\{C\}\_\{u\}\\times\\bm\{W\}^\{1/2\}\)\\times\(\\bm\{W\}^\{1/2\}\\times\\bm\{\\phi\}\)=\\lambda\(\\bm\{W\}^\{1/2\}\\times\\bm\{\\phi\}\),\(13\)whereϕ\\bm\{\\phi\}is the eigenvector,𝑾\\bm\{W\}is a diagonal matrix with the diagonal elementsWj​j=wjW\_\{jj\}=w\_\{j\}, andwjw\_\{j\}is the volume ofjjth cell in the numerical solutions𝒖\(i\)\\bm\{u\}^\{\(i\)\}\[[16](https://arxiv.org/html/2606.28519#bib.bib1)\]\. Here, we assume that all solutions𝒖\(i\)\\bm\{u\}^\{\(i\)\}are obtained on the same numerical mesh\. For large\-scale applications, the𝐂u\\mathbf\{C\}\_\{u\}size makes solving this eigenvalue problem impractical\.

The common solution to reducing the computational cost of solving the eigenvalue problem is to use the SVD decomposition:

𝑿~=𝑼​𝑺​𝑽T\\tilde\{\\bm\{X\}\}=\\bm\{U\}\\bm\{S\}\\bm\{V\}^\{T\}where𝑿~=𝑾1/2​𝑿\\tilde\{\\bm\{X\}\}=\\bm\{W\}^\{1/2\}\\bm\{X\},𝑼∈ℝNx​y​z​t×Nx​y​z​t\\bm\{U\}\\in\\mathbb\{R\}^\{N\_\{xyzt\}\\times N\_\{xyzt\}\}is the orthogonal matrix,𝑺∈ℝNt​r​a​i​n×Nx​y​z​t\\bm\{S\}\\in\\mathbb\{R\}^\{N\_\{train\}\\times N\_\{xyzt\}\}is the diagonal matrix containing singular valuessis\_\{i\}\(in the descending order\), and𝑽∈ℝNt​r​a​i​n×Nt​r​a​i​n\\bm\{V\}\\in\\mathbb\{R\}^\{N\_\{train\}\\times N\_\{train\}\}is the orthogonal matrix containing eigenvectors\. The eigenvalues are found as

λi=si2Nt​r​a​i​n−1\.\\lambda\_\{i\}=\\frac\{s\_\{i\}^\{2\}\}\{N\_\{train\}\-1\}\.\(14\)The mean and eigenfunctionsu¯​\(𝒙,t\)\\overline\{u\}\(\\bm\{x\},t\)andϕi​\(𝒙,t\)\\phi\_\{i\}\(\\bm\{x\},t\)are approximated via a polynomial interpolation of their nodal values given by𝒖¯\\overline\{\\bm\{u\}\}and the eigenvectors, respectively\. It should be noted that in the standard PCA and SVD\-based expansions,𝑾\\bm\{W\}is replaced with the identity matrix\. The resulting eigenvalues and eigenvectors are, in general, different from those given by the solution of the eigenvalue problem \([13](https://arxiv.org/html/2606.28519#S2.E13)\)\.

While being faster than eigenvalue solvers, the SVD decomposition still requires storing theNx​y​z​t×Nx​y​z​tN\_\{xyzt\}\\times N\_\{xyzt\}matrix𝑼\\bm\{U\}, i\.e\., it does not address memory limitations\. Also, to avoid overfitting in𝒩​𝒩​\(𝝃\)\\mathcal\{NN\}\(\\bm\{\\xi\}\)training,NηN\_\{\\eta\}must be kept smaller \(often by an order of magnitude\) thanNt​r​a​i​nN\_\{train\}, in turn requiring onlyNt​r​a​i​nN\_\{train\}largest eigenvalues and the corresponding eigenfunctions\. This allows reducing both the computation time and the memory requirements by replacing the \(full\) SVD decomposition with the low\-rank SVD decomposition\[[4](https://arxiv.org/html/2606.28519#bib.bib7)\]

𝑿~=𝑼r​𝑺r​𝑽rT,\\tilde\{\\bm\{X\}\}=\\bm\{U\}\_\{r\}\\bm\{S\}\_\{r\}\\bm\{V\}\_\{r\}^\{T\},where𝑼r∈ℝNx​y​z​t×Nη\\bm\{U\}\_\{r\}\\in\\mathbb\{R\}^\{N\_\{xyzt\}\\times N\_\{\\eta\}\}contains the firstNηN\_\{\\eta\}columns of𝑼\\bm\{U\},𝑺r∈ℝNη×Nη\\bm\{S\}\_\{r\}\\in\\mathbb\{R\}^\{N\_\{\\eta\}\\times N\_\{\\eta\}\}is the diagonal matrix containing firstNηN\_\{\\eta\}singular values, and𝑽r∈ℝNt​r​a​i​n×Nη\\bm\{V\}\_\{r\}\\in\\mathbb\{R\}^\{N\_\{train\}\\times N\_\{\\eta\}\}is the orthogonal matrix containing eigenvectors\. The main memory saving here is achieved by replacing𝑼∈ℝNx​y​z​t×Nx​y​z​t\\bm\{U\}\\in\\mathbb\{R\}^\{N\_\{xyzt\}\\times N\_\{xyzt\}\}with𝑼r∈ℝNx​y​z​t×Nη\\bm\{U\}\_\{r\}\\in\\mathbb\{R\}^\{N\_\{xyzt\}\\times N\_\{\\eta\}\}\.

However, forNx​y​z​tN\_\{xyzt\}on the order of tens of millions, performing the low\-rank SVD could require prohibitively large computer memory\. To further reduce the computational cost of constructing KLEs, we decompose multi\-dimensional KLEs into lower\-dimensional KLEs\[[29](https://arxiv.org/html/2606.28519#bib.bib33)\]\. For example, the KLE of the stochastic processUU

U​\(𝒙,t\)=u¯​\(𝒙,t\)\+∑i=1Nηηi~​ϕi​\(𝒙,t\)​λi,U\(\\bm\{x\},t\)=\\bar\{u\}\(\\bm\{x\},t\)\+\\sum^\{N\_\{\\eta\}\}\_\{i=1\}\\tilde\{\\eta\_\{i\}\}\\phi\_\{i\}\(\\bm\{x\},t\)\\sqrt\{\\lambda\_\{i\}\},\(15\)where\{ηi~\}i=1Nη\\\{\\tilde\{\\eta\_\{i\}\}\\\}\_\{i=1\}^\{N\_\{\\eta\}\}are uncorrelated random variables, can be approximated as

U​\(𝒙,t\)=u¯​\(𝒙,t\)\+∑i=1Nηη~i′​\(t\)​ϕi′​\(𝒙,t\)​λi′​\(t\)\.U\(\\bm\{x\},t\)=\\bar\{u\}\(\\bm\{x\},t\)\+\\sum^\{N\_\{\\eta\}\}\_\{i=1\}\{\\tilde\{\\eta\}^\{\\prime\}\_\{i\}\}\(t\)\\phi^\{\\prime\}\_\{i\}\(\\bm\{x\},t\)\\sqrt\{\\lambda^\{\\prime\}\_\{i\}\(t\)\}\.\(16\)The random processes\{ηi~\}i=1Nη\\\{\\tilde\{\\eta\_\{i\}\}\\\}\_\{i=1\}^\{N\_\{\\eta\}\}are further decomposed using one\-dimensional KLEs:

η~i′​\(t\)≈ηi¯\+∑j=1Nγγi​j​ψi​j​\(t\)​δi​j,\\tilde\{\\eta\}^\{\\prime\}\_\{i\}\(t\)\\approx\\bar\{\\eta\_\{i\}\}\+\\sum\_\{j=1\}^\{N\_\{\\gamma\}\}\\gamma\_\{ij\}\\psi\_\{ij\}\(t\)\\sqrt\{\\delta\_\{ij\}\},\(17\)whereηi¯​\(t\)\\bar\{\\eta\_\{i\}\}\(t\)is the mean ofη~i′​\(t\)\\tilde\{\\eta\}^\{\\prime\}\_\{i\}\(t\)andψi​j​\(t\)\\psi\_\{ij\}\(t\)andδi​j\\delta\_\{ij\}are the eigenfunctions and eigenvalues of the covarianceCη~i​\(t,t′\)C\_\{\\tilde\{\\eta\}\_\{i\}\}\(t,t^\{\\prime\}\)ofη~i′​\(t\)\\tilde\{\\eta\}^\{\\prime\}\_\{i\}\(t\)\. We note that the ensemble mean ofη~i′​\(t\)\\tilde\{\\eta\}^\{\\prime\}\_\{i\}\(t\)is zero, but its sample approximation might not be\.

In Eq \([16](https://arxiv.org/html/2606.28519#S2.E16)\),ϕi′​\(𝒙,t\)\\phi^\{\\prime\}\_\{i\}\(\\bm\{x\},t\)andλi′​\(t\)\\lambda^\{\\prime\}\_\{i\}\(t\)are given by the solution of the eigenvalue problem

∫DCu​\(𝒙,𝒙′,t,t\)​ϕ′​\(𝒙′,t\)​d𝒙′=λ′​\(t\)​ϕ′​\(𝒙,t\)\.\\int\_\{D\}C\_\{u\}\(\\bm\{x\},\\bm\{x\}^\{\\prime\},t,t\)\\phi^\{\\prime\}\(\\bm\{x\}^\{\\prime\},t\)\\,\\mathrm\{d\}\\bm\{x\}^\{\\prime\}=\\lambda^\{\\prime\}\(t\)\\phi^\{\\prime\}\(\\bm\{x\},t\)\.\(18\)The dimensionality of this eigenvalue problem isdd\(the number of spatial dimensions\) while the dimensionality of the eigenvalue problem in Eq \([12](https://arxiv.org/html/2606.28519#S2.E12)\) isd\+1d\+1\. For the KLE in Eq \([16](https://arxiv.org/html/2606.28519#S2.E16)\), the largest matrix in the low\-rank SVD has sizeNx​y​z×NηN\_\{xyz\}\\times N\_\{\\eta\}, whereNx​y​z=Nx×Ny×NzN\_\{xyz\}=N\_\{x\}\\times N\_\{y\}\\times N\_\{z\}\. This matrix isNtN\_\{t\}times smaller than the corresponding matrix in the full space–time KLE\. There is an extra cost of computing the eigenpairs of the KLE \([17](https://arxiv.org/html/2606.28519#S2.E17)\); however, since this KLE is one\-dimensional, this extra cost is not significant\.

We first compute\{ϕi′​\(𝒙,tj\),λi′​\(tj\)\}j=1Nt\\\{\\phi^\{\\prime\}\_\{i\}\(\\bm\{x\},t\_\{j\}\),\\lambda^\{\\prime\}\_\{i\}\(t\_\{j\}\)\\\}\_\{j=1\}^\{N\_\{t\}\}by performing low\-rank SVD of the centered data matrix𝐗i∈ℝNt​r​a​i​n×Nx​y​z\\mathbf\{X\}\_\{i\}\\in\\mathbb\{R\}^\{N\_\{train\}\\times N\_\{xyz\}\}\(i=1,…,Nti=1,\.\.\.,N\_\{t\}\) consisting of elements\{Xi,k​l=ui,k​l−u¯i,l\}k=1,l=1Nt​r​a​i​n,Nx​y​z\\\{X\_\{i,kl\}=u\_\{i,kl\}\-\\overline\{u\}\_\{i,l\}\\\}\_\{k=1,l=1\}^\{N\_\{train\},N\_\{xyz\}\}for each timesteptit\_\{i\}, whereui,k​lu\_\{i,kl\}is the element of thekkth solution sample atllth spatial element at timetit\_\{i\}andu¯i,l\\overline\{u\}\_\{i,l\}is the sample mean atllth spacial element at timetit\_\{i\}\. Next,\{η~i\(k\)′​\(tj\)\}i=1,j=1,k=1Nη,Nt,Nt​r​a​i​n\\\{\\tilde\{\\eta\}^\{\{\}^\{\\prime\}\(k\)\}\_\{i\}\(t\_\{j\}\)\\\}\_\{i=1,j=1,k=1\}^\{N\_\{\\eta\},N\_\{t\},N\_\{train\}\}are computed by the least\-squares fitting of KLE \([16](https://arxiv.org/html/2606.28519#S2.E16)\) to𝒖\(i\)\\bm\{u\}^\{\(i\)\}as

η~1\(k\)′​\(tj\),…,η~Nη\(k\)′​\(tj\)=minη~1′,…,η~Nη′​∑l=1Nx​y​z\[uj,l\(k\)−u¯​\(𝒙l,tj\)−∑i=1Nηη~i′​ϕi′​\(𝒙l,tj\)​λi′​\(tj\)\]2\.\\tilde\{\\eta\}^\{\{\}^\{\\prime\}\(k\)\}\_\{1\}\(t\_\{j\}\),\.\.\.,\\tilde\{\\eta\}^\{\{\}^\{\\prime\}\(k\)\}\_\{N\_\{\\eta\}\}\(t\_\{j\}\)=\\min\_\{\\tilde\{\\eta\}^\{\\prime\}\_\{1\},\.\.\.,\\tilde\{\\eta\}^\{\\prime\}\_\{N\_\{\\eta\}\}\}\\sum\_\{l=1\}^\{N\_\{xyz\}\}\\left\[u^\{\(k\)\}\_\{j,l\}\-\\bar\{u\}\(\\bm\{x\}\_\{l\},t\_\{j\}\)\-\\sum^\{N\_\{\\eta\}\}\_\{i=1\}\{\\tilde\{\\eta\}^\{\\prime\}\_\{i\}\}\\phi^\{\\prime\}\_\{i\}\(\\bm\{x\}\_\{l\},t\_\{j\}\)\\sqrt\{\\lambda^\{\\prime\}\_\{i\}\(t\_\{j\}\)\}\\right\]^\{2\}\.\(19\)
Finally,ψi​j\\psi\_\{ij\}andδi​j\\delta\_\{ij\}are computed using the low\-rank SVD of the data matrix𝑿k∈ℝNt​r​a​i​n×Nt\\bm\{X\}\_\{k\}\\in\\mathbb\{R\}^\{N\_\{train\}\\times N\_\{t\}\}\(k=1,…,Nηk=1,\.\.\.,N\_\{\\eta\}\) consisting of elements\{Xk,i​j=ηk,j\(i\)−η¯k,j\}i=1,j=1Nt​r​a​i​n,Nt\\\{X\_\{k,ij\}=\\eta^\{\(i\)\}\_\{k,j\}\-\\overline\{\\eta\}\_\{k,j\}\\\}\_\{i=1,j=1\}^\{N\_\{train\},N\_\{t\}\}for eachηk\\eta\_\{k\}component\.

The final form of the neural operator is

𝒢u​\[y\]​\(𝒙,t\)=u¯​\(𝒙,t\)\+∑i=1Nη\[ηi¯\+∑j=1Nγγi​j​\[𝝃​\(y\)\]​δi​j​ψi​j​\(t\)\]​λi​\(t\)​ϕi​\(𝒙,t\)\.\\mathcal\{G\}\_\{u\}\[y\]\(\\bm\{x\},t\)=\\bar\{u\}\(\\bm\{x\},t\)\+\\sum\_\{i=1\}^\{N\_\{\\eta\}\}\\left\[\\bar\{\\eta\_\{i\}\}\+\\sum\_\{j=1\}^\{N\_\{\\gamma\}\}\\gamma\_\{ij\}\[\\bm\{\\xi\}\(y\)\]\\sqrt\{\\delta\_\{ij\}\}\\psi\_\{ij\}\(t\)\\right\]\\sqrt\{\\lambda\_\{i\}\(t\)\}\\phi\_\{i\}\(\\bm\{x\},t\)\.\(20\)
Next, we discuss the criteria for selecting the total number of terms in the KLEs ofuuandyy, i\.e\.,NξN\_\{\\xi\}andNηN\_\{\\eta\}in the full\-dimensional KLEs \([8](https://arxiv.org/html/2606.28519#S2.E8)\) and \([4](https://arxiv.org/html/2606.28519#S2.E4)\) orNη×NγN\_\{\\eta\}\\times N\_\{\\gamma\}in the low\-dimensional KLE \([20](https://arxiv.org/html/2606.28519#S2.E20)\)\. We stated earlier thatNξN\_\{\\xi\}andNηN\_\{\\eta\}must be less than or equal toNt​r​a​i​nN\_\{train\}\. On the other hand,NξN\_\{\\xi\}andNηN\_\{\\eta\}are the dimensions of the input and output layers of the DNN, which is trained withNt​r​a​i​nN\_\{train\}samples\. The rule of thumb for selecting the DNN size as a function of the number of samples in the dataset is to keep the total number of trainable parameters in the DNN lower thanNt​r​a​i​nN\_\{train\}by at least a factor of 10\[[2](https://arxiv.org/html/2606.28519#bib.bib9)\], which requiresNη\+Nξ<<Nt​r​a​i​nN\_\{\\eta\}\+N\_\{\\xi\}<<N\_\{train\}\. In practice, this rule of thumb is too restrictive\. The values ofNηN\_\{\\eta\}andNξN\_\{\\xi\}are treated as hyperparameters and are chosen to minimize the error in predicting unseen solutions, the so\-called testing error\.

At this point, it is appropriate to discuss the relationship between the KL\-DNN model and the PCA\-Net and DeepONet models\. The PCA\-Net and KL\-DNN methods are structurally similar and use related PCA and KL\-DNN expansions\. Moreover, both methods rely on SVD to construct the expansions\. The only differences between the PCA\-Net and KL\-DNN models are how the data is generated \(in the KL\-DNN model, Monte Carlo sampling is used, whereas there are no such restrictions in the PCA\-Net\) and how the data matrix𝑿\\bm\{X\}is constructed\. In KL\-DNN, the elements of𝑿\\bm\{X\}are weighted with the size of numerical elements for the singular values and eigenvectors to approximate the solution of the eigenvalue problem, while in PCA\-Net, no weights are applied to the data\.

The vanilla DeepONet method is shown in Figure[1](https://arxiv.org/html/2606.28519#S2.F1)\. In both the KL\-DNN and DeepONet methods, the operator is given as the product of the parameter and basis function vectors\. In the DeepONet, the basis functions are the output of the trunk DNN, and the parameter vector𝜼′\\bm\{\\eta\}^\{\\prime\}is the output of the branch DNN\. In KL\-DNN, the basis functions are obtained by SVD, and the vector𝜼\\bm\{\\eta\}is obtained by applying the inverse KL operators to the state variables\. Despite the similarity, KL\-DNN has a computational advantage over DeepONet\. The “training” of all components in the KL\-DNN model \(solving eigenvalue problems, applying inverse KL operations, and the DNN training\) is done independently and without accessing the entire datasetDtrainD\_\{\\text\{train\}\}\. In DeepONet, the trunk and branch DNNs \(including all sub\-branch DNNs\) must be trained jointly using an expensive\-to\-evaluate loss function similar to \([5](https://arxiv.org/html/2606.28519#S2.E5)\), which requires access to the entireDtrainD\_\{\\text\{train\}\}dataset and, in turn, large computer memory\.

Therefore, the KL\-DNN architecture in Figure[1](https://arxiv.org/html/2606.28519#S2.F1)can be considered a variant of DeepONet that enables separate training of the trunk, branch encoder, and branch DNN\.

Finally, we discuss the KL\-DNN model forNp\>1N\_\{p\}\>1, i\.e\., a problem with more than one parameter field\. If the parameter fields are uncorrelated, then each parameter fieldyiy\_\{i\}should be represented with a separate KLE with the KL coefficient vector𝝃i\\bm\{\\xi\}\_\{i\}, yielding the vector containing KL coefficients of all parameters,𝝃T=\[𝝃1,…,𝝃Np\]\\bm\{\\xi\}\_\{T\}=\[\\bm\{\\xi\}\_\{1\},\.\.\.,\\bm\{\\xi\}\_\{N\_\{p\}\}\]\. The KL\-DNN model is then modified by using𝝃T\\bm\{\\xi\}\_\{T\}as input to the DNN, i\.e\.,𝜼=𝒩​𝒩​\(𝝃T\)\\bm\{\\eta\}=\\mathcal\{NN\}\(\\bm\{\\xi\}\_\{T\}\)\. It should be noted that the DNN size depends on the input layer size, and training a DNN with a larger input layer requires more samples\. Therefore, there is a need to reduce the size of the latent space representing all parameter fields \(and the input layer of the DNN model\), especially for large\-scale problems\.

The correlated parameter fields can be expressed with a single KLE\[[5](https://arxiv.org/html/2606.28519#bib.bib42)\]\. The advantage of the single KLE representation is that it requires fewer KL coefficients to represent the correlated fields than the individual KLEs\. To compute eigenfunctions and eigenvalues in the single KLE representation ofNpN\_\{p\}parameter fields, the parameter vectors𝒚i∈ℝNx​y​z\\bm\{y\}\_\{i\}\\in\\mathbb\{R\}^\{N\_\{xyz\}\}are assembled in a single parameter vector𝒚∈ℝNp​Nx​y​z\\bm\{y\}\\in\\mathbb\{R\}^\{N\_\{p\}N\_\{xyz\}\}as

𝒚=\[𝒚1,…,𝒚Np\]\.\\bm\{y\}=\\left\[\\bm\{y\}\_\{1\},\.\.\.,\\bm\{y\}\_\{N\_\{p\}\}\\right\]\.\(21\)Then, the eigenvalues and eigenvectors in the𝒚\\bm\{y\}KLE are found using the SVD decomposition of the data matrix𝑿∈ℝNp​Nx​y​z×Nt​r​a​i​n\\bm\{X\}\\in\\mathbb\{R\}^\{N\_\{p\}N\_\{xyz\}\\times N\_\{train\}\}with rows formed by the vectors𝒚\(i\)−𝒚¯\\bm\{y\}^\{\(i\)\}\-\\overline\{\\bm\{y\}\}, where𝒚\(i\)∈ℝNp​Nx​y​z\\bm\{y\}^\{\(i\)\}\\in\\mathbb\{R\}^\{N\_\{p\}N\_\{xyz\}\}is the parameter vector in theiith sample ofDDand𝒚¯∈ℝNp​Nx​y​z\\overline\{\\bm\{y\}\}\\in\\mathbb\{R\}^\{N\_\{p\}N\_\{xyz\}\}is the sample mean of\{𝒚\(i\)\}i=1Nt​r​a​i​n\\\{\\bm\{y\}^\{\(i\)\}\\\}\_\{i=1\}^\{N\_\{train\}\}\.

![Refer to caption](https://arxiv.org/html/2606.28519v1/kldnn_deeponet_new.png)Figure 1:The KL\-DNN operator\-learning framework \(left\) and the vanilla DeepONet architecture \(right\)\. In KL\-DNN, the KLE of the state variable replaces the DeepONet trunk\. The inverse KLE of the input variableyyin KL\-DNN replaces the encoder part of the DeepONet branch\. The DNN part of KL\-DNN is analogous to the fully connected layers in the DeepONet branch\.### 2\.1 Evaluation Metrics

We evaluate the accuracy of the operator learning models and KLE representations using the[Mean Absolute Error](https://arxiv.org/html/2606.28519#id11.11.id11)\([MAE](https://arxiv.org/html/2606.28519#id11.11.id11)\):

M​A​E=1Nx​y​z​1Nt​1Ns​a​m​p​l​e​s​∑i=1Nx​y​z∑j=1Nt∑k=1Ns​a​m​p​l​e​s\|𝒢​\[𝒚\(k\)\]​\(𝒙i,tj\)−u\(k\)​\(𝒙i,tj\)\|MAE=\\frac\{1\}\{N\_\{xyz\}\}\\frac\{1\}\{N\_\{t\}\}\\frac\{1\}\{N\_\{samples\}\}\\sum\_\{i=1\}^\{N\_\{xyz\}\}\\sum\_\{j=1\}^\{N\_\{t\}\}\\sum\_\{k=1\}^\{N\_\{samples\}\}\|\\mathcal\{G\}\[\\bm\{y\}^\{\(k\)\}\]\(\\bm\{x\}\_\{i\},t\_\{j\}\)\-u^\{\(k\)\}\(\\bm\{x\}\_\{i\},t\_\{j\}\)\|\(22\)and root mean square error[RMSE](https://arxiv.org/html/2606.28519#id10.10.id10):

R​M​S​E=1Ns​a​m​p​l​e​s​∑k=1Ns​a​m​p​l​e​s1Nx​y​z​1Nt​∑i=1Nx​y​z∑j=1Nt\(𝒢​\[𝒚\(k\)\]​\(𝒙i,tj\)−u\(k\)​\(𝒙i,tj\)\)2,RMSE=\\frac\{1\}\{N\_\{samples\}\}\\sum\_\{k=1\}^\{N\_\{samples\}\}\\sqrt\{\\frac\{1\}\{N\_\{xyz\}\}\\frac\{1\}\{N\_\{t\}\}\\sum\_\{i=1\}^\{N\_\{xyz\}\}\\sum\_\{j=1\}^\{N\_\{t\}\}\\left\(\\mathcal\{G\}\[\\bm\{y\}^\{\(k\)\}\]\(\\bm\{x\}\_\{i\},t\_\{j\}\)\-u^\{\(k\)\}\(\\bm\{x\}\_\{i\},t\_\{j\}\)\\right\)^\{2\}\},\(23\)whereNs​a​m​p​l​e​sN\_\{samples\}is the number of samples \(usually, the number of test samples\) for which the accuracy of the solution is evaluated\.[RMSE](https://arxiv.org/html/2606.28519#id10.10.id10)is more sensitive to large local errors than[MAE](https://arxiv.org/html/2606.28519#id11.11.id11)\. In this work, we use RMSE to train DNNs in the operator learning model and MAE to optimize hyperparameters, includingNηN\_\{\\eta\},NξN\_\{\\xi\}, and the size of the DNN’s hidden layers\.

We also use time\-dependentℓ∞\\ell\_\{\\infty\}error

ℓ∞​\(tj\)=max𝒙⁡\|\(𝒢​\[𝒚\(k\)\]​\(𝒙,tj\)−u\(k\)​\(𝒙,tj\)\)\|\\ell\_\{\\infty\}\(t\_\{j\}\)=\\max\_\{\\bm\{x\}\}\{\\left\|\\left\(\\mathcal\{G\}\[\\bm\{y\}^\{\(k\)\}\]\(\\bm\{x\},t\_\{j\}\)\-u^\{\(k\)\}\(\\bm\{x\},t\_\{j\}\)\\right\)\\right\|\}\(24\)to study how maximum model errors change over time\.

## 3 Numerical Example

We test the proposed operator learning method for modeling CO2sequestration at the IBDP site\[[7](https://arxiv.org/html/2606.28519#bib.bib38)\]\.[IBDP](https://arxiv.org/html/2606.28519#id3.3.id3)is a carbon capture and storage pilot project for CO2injection into a deep saline reservoir\. Over a 3\-year period starting in November 2011, approximately 1 million tonnes of CO2were injected into the reservoir\.

![Refer to caption](https://arxiv.org/html/2606.28519v1/ibdp_geology.png)Figure 2:Left column: geological facies, porosity, and horizontal permeability from the IBDP calibration study reported in\[[7](https://arxiv.org/html/2606.28519#bib.bib38)\]\. Right column: statistical realizations of the geological facies, porosity, and horizontal permeability\.To train and test the operator learning model, we use a dataset from\[[10](https://arxiv.org/html/2606.28519#bib.bib32)\]consisting of 100 simulations of the CO2pressure and saturation on a\(126,125,110\)\(126,125,110\)unstructured grid\. The simulations were performed using the ECLIPSE software \(\[[14](https://arxiv.org/html/2606.28519#bib.bib36)\]\)\. Estimates of reservoir parameters, including horizontal and vertical permeability \(khk\_\{h\}andkvk\_\{v\}\) and porosity \(nn\), were statistically generated using specified mean and covariance models\. The statistical models were developed as part of the IBDP model parameterization study, using stratigraphic characterization and analysis of seismic data\[[7](https://arxiv.org/html/2606.28519#bib.bib38)\]\. The facies, porosity, and horizontal permeability from this study, as well as one realization of these properties using the statistical model, are shown in Figure[2](https://arxiv.org/html/2606.28519#S3.F2)\.

For dataset generation, the statistical model was further modified to match the observed trend in CO2saturation at an observation well at the IBDP site\. A modifier was applied to the vertical permeability to contain the CO2plume to the lower part of the reservoir, as was observed in the field\. The resulting dataset contains 100 samples in total, including 80 with modifiers and 20 without modifiers\. All solutions were obtained for the same CO2injection schedule, shown in Figure[3](https://arxiv.org/html/2606.28519#S3.F3)\. The dataset provides CO2pressureppand saturationSSmonthly over 50 months\.

![Refer to caption](https://arxiv.org/html/2606.28519v1/injection_rate.png)Figure 3:The CO2injection schedule used for generating the training and testing data based on the historic injection rates at the IBDP site\.In this study, we construct separate operator\-learning models for permeability fields with and without modifiers\. Ninety percent of the samples are used for training, while ten percent are reserved for testing\. Furthermore, we train𝒢p​\[yh,yv,n\]​\(𝒙,t\)\\mathcal\{G\}\_\{p\}\[y\_\{h\},y\_\{v\},n\]\(\\bm\{x\},t\)and𝒢S​\[yh,yv,n\]​\(𝒙,t\)\\mathcal\{G\}\_\{S\}\[y\_\{h\},y\_\{v\},n\]\(\\bm\{x\},t\)models for pressure and saturation, respectively, whereyh=ln⁡khy\_\{h\}=\\ln k\_\{h\}andyv=ln⁡kvy\_\{v\}=\\ln k\_\{v\}\. The three input variables are represented with a single KLE as described in Section[2](https://arxiv.org/html/2606.28519#S2)\. In the following, we use superscripts 80 and 20 to denote the operator learning models trained for the permeability fields with and without modifiers, respectively\. For example,𝒢p80​\[yh,yv,n\]​\(𝒙,t\)\\mathcal\{G\}\_\{p\}^\{80\}\[y\_\{h\},y\_\{v\},n\]\(\\bm\{x\},t\)is the pressure operator learning model trained on permeability fields with modifiers\. The representation of inputs in the operator\-learning models, including the size of the data matrices,NξN\_\{\\xi\}, time and memory required for constructing the KLEs, and errors in approximating parameters due to KLE truncation, are summarized in Table[1](https://arxiv.org/html/2606.28519#S3.T1)\.

Table 1:KLE representation of input fieldsyhy\_\{h\},yvy\_\{v\}, andnnin the IBDP dataset: the training dataset sizeNt​r​a​i​nN\_\{train\}; the size of the data matrices for constructing a joint KLE of the input fields \(𝑿\\bm\{X\}size\), the number of terms,NξN\_\{\\xi\}, in the joint KLE expansion of the input fields; time for constructing the joint KLE of the input fields \(KLE time\); computer memory required for performing SVD \(KLE memory\); and RMSEs in approximatingyhy\_\{h\},yvy\_\{v\}, andnnfields with KLEs \(KLE RMSE\)\.ModelNt​r​a​i​nN\_\{train\}𝑿\\bm\{X\}sizeNξN\_\{\\xi\}[KLE](https://arxiv.org/html/2606.28519#id5.5.id5)time[KLE](https://arxiv.org/html/2606.28519#id5.5.id5)memory[KLE](https://arxiv.org/html/2606.28519#id5.5.id5)yhy\_\{h\}RMSE[KLE](https://arxiv.org/html/2606.28519#id5.5.id5)yvy\_\{v\}RMSE[KLE](https://arxiv.org/html/2606.28519#id5.5.id5)nnRMSE𝒢p80\\mathcal\{G\}\_\{p\}^\{80\}/𝒢s80\\mathcal\{G\}\_\{s\}^\{80\}72\(5\.2M, 72\)1833 s3\.2 GB1\.7661\.6830\.0319𝒢p20\\mathcal\{G\}\_\{p\}^\{20\}/𝒢s20\\mathcal\{G\}\_\{s\}^\{20\}18\(5\.2M, 18\)911 s1\.3 GB1\.6501\.6500\.0322In the dataset, the CO2pressure and saturation are computed over the entire computational domain, with a sample size ofNx​y​z​tp=126×125×110×50N^\{p\}\_\{xyzt\}=126\\times 125\\times 110\\times 50\. We train𝒢p​\[yh,yv,n\]​\(𝒙,t\)\\mathcal\{G\}\_\{p\}\[y\_\{h\},y\_\{v\},n\]\(\\bm\{x\},t\)to model pressure for the entire domain using the full samples of theppsolution\. On the other hand, when training𝒢S​\[yh,yv,n\]​\(𝒙,t\)\\mathcal\{G\}\_\{S\}\[y\_\{h\},y\_\{v\},n\]\(\\bm\{x\},t\)model, we observe that in all simulations in the dataset, the CO2plume is confined to a relatively small subdomain \(40×44×9440\\times 44\\times 94grid points\) centered on the injection well\. Therefore, we train𝒢S​\[yh,yv,n\]​\(𝒙,t\)\\mathcal\{G\}\_\{S\}\[y\_\{h\},y\_\{v\},n\]\(\\bm\{x\},t\)for the CO2saturation modeling only in this subdomain, reducing the sample size of theSSsolution toNx​y​z​tS=40×44×94×50=165440×50=8\.272​MN^\{S\}\_\{xyzt\}=40\\times 44\\times 94\\times 50=165440\\times 50=8\.272M\. The resulting sizes of the data matrices𝑿\\bm\{X\}used to perform low\-rank SVD for representing pressure and saturation in the𝒢p\\mathcal\{G\}\_\{p\}and𝒢S\\mathcal\{G\}\_\{S\}models, as well as the number of KL terms, time to and memory required for constructing KLEs, the approximation errors, and KL\-DNN model errors are listed in Table[2](https://arxiv.org/html/2606.28519#S3.T2)\.

In the context of𝒢p\\mathcal\{G\}\_\{p\}models, it is essential to utilize a low\-dimensional Karhunen\-Loève expansion \(KLE\) for the pressure field\. This approach is necessary because performing a low\-rank singular value decomposition \(SVD\) on𝑿∈ℝ72×86\.6​M\\bm\{X\}\\in\\mathbb\{R\}^\{72\\times 86\.6M\}for the full\-dimensional KLE of pressure is computationally infeasible\. The full\-dimensional KLE would require storing an𝑼\\bm\{U\}matrix of size 86\.6M × 86\.6M, which demands approximately 29,600 petabytes of memory\. This amount of memory exceeds the capabilities of one of the largest existing supercomputers, El Capitan, by nearly six times\[[1](https://arxiv.org/html/2606.28519#bib.bib3)\]\. The𝒚\\bm\{y\}parameter in the pressure model and both𝒚\\bm\{y\}andSSin the saturation model are modeled with full\-dimensional KLEs, as the sizes of the data matrices for these variables are small enough for performing low\-rank SVDs\.

The DNN models in𝒢p​\[yh,yv,n\]​\(𝒙,t\)\\mathcal\{G\}\_\{p\}\[y\_\{h\},y\_\{v\},n\]\(\\bm\{x\},t\)and𝒢S​\[yh,yv,n\]​\(𝒙,t\)\\mathcal\{G\}\_\{S\}\[y\_\{h\},y\_\{v\},n\]\(\\bm\{x\},t\)have 3750 and 1957 trainable parameters, respectively\. Both DNN models are trained on an NVIDIA RTX A6000 GPU; the total training time of all four models \(𝒢p80\\mathcal\{G\}\_\{p\}^\{80\},𝒢p20\\mathcal\{G\}\_\{p\}^\{20\},𝒢S80\\mathcal\{G\}\_\{S\}^\{80\}, and𝒢S20\\mathcal\{G\}\_\{S\}^\{20\}\) is around 20 minutes, and inference time is less than 1 minute per model\.

First, we study the accuracy of the low\-dimensional KLE approximation relative to the full dimensional approximation\. Since the full\-dimensional KLE ofppcannot be constructed, we compare the KLE approximation RMSEs in the full\- and low\-dimensional KLEs of the saturation field defined on the smaller domain centered around the injection well\. An approximation RMSE is defined according to Eq \([23](https://arxiv.org/html/2606.28519#S2.E23)\) with𝜼\\bm\{\\eta\}in𝒢​\[𝒚\(j\)\]\\mathcal\{G\}\[\\bm\{y\}^\{\(j\)\}\]determined from the inverse KLE operator \([9](https://arxiv.org/html/2606.28519#S2.E9)\)\. We find that the low\-dimensional KLE increases the approximation \([RMSE](https://arxiv.org/html/2606.28519#id10.10.id10)\) by less than 2% as compared to the full\-dimensional KLE\. This comparison demonstrates that a low\-dimensional KLE can be used to construct an accurate low\-dimensional representation of massive datasets without resorting to spatial coarsening or domain subsampling\.

Table 2:Performance and characteristics of the KL\-DNN models used in this study: the training dataset sizeNt​r​a​i​nN\_\{train\}; the testing dataset sizeNt​e​s​tN\_\{test\};𝑿\\bm\{X\}size is the size of the data matrices for constructing a full or nested KLE of the state variables \(pporSS\) input fields; the number of terms \(NηN\_\{\\eta\}andNγN\_\{\\gamma\}\) in the low\-dimensional KLE expansion of pressure and saturation \(NγN\_\{\\gamma\}is not listed for𝒢S\\mathcal\{G\}\_\{S\}models because the full\-dimensional KLEs ofSSare used\); KLE time is the total time for training the operator learning models \(the DNN training in all models takes about 5 minutes\); KLE memory is the computer memory required for performing SVD of theppandSSdata matrices; KLE RMSE is the RMSE in approximatingppandSSfields with KLEs; and KL\-DNN RMSE is the total RMSE in the operator\-learning model predictions\.ModelNt​r​a​i​nN\_\{train\}Nt​e​s​tN\_\{test\}𝑿\\bm\{X\}sizeNηN\_\{\\eta\}NγN\_\{\\gamma\}[KLE](https://arxiv.org/html/2606.28519#id5.5.id5)time[KLE](https://arxiv.org/html/2606.28519#id5.5.id5)memory[KLE](https://arxiv.org/html/2606.28519#id5.5.id5)RMSE[Karhunen\-Loève Deep Neural Network](https://arxiv.org/html/2606.28519#id1.1.id1)\([KL\-DNN](https://arxiv.org/html/2606.28519#id1.1.id1)\)RMSE𝒢p80\\mathcal\{G\}\_\{p\}^\{80\}728\(86\.6M, 72\)431000 s1 GB1\.045 psi1\.139 psi𝒢p20\\mathcal\{G\}\_\{p\}^\{20\}182\(86\.6M, 18\)32150 s0\.4 GB0\.811 psi0\.956 psi𝒢S80\\mathcal\{G\}\_\{S\}^\{80\}728\(8\.3M, 72\)6\-50 s2\.3 GB0\.01400\.0148𝒢S20\\mathcal\{G\}\_\{S\}^\{20\}182\(8\.3M, 18\)6\-15 s1\.5 GB0\.01340\.0138Next, we discuss the performance and accuracy of the KL\-DNN models for the pressure and saturation fields as summarized in Table[2](https://arxiv.org/html/2606.28519#S3.T2)\. The reported training time, memory, approximation errors, and model errors are all averages over test cases\. The number of test samples for each operator learning model is given in the “Nt​e​s​tN\_\{test\}” column\. The training times reported in the “[KLE](https://arxiv.org/html/2606.28519#id5.5.id5)time” column include time spent computing eigenfunctions for both state variables and parameters, as well as for computing the reduced \(latent space\) datasetDlD\_\{l\}\. In the𝒢p80\\mathcal\{G\}\_\{p\}^\{80\}and𝒢p20\\mathcal\{G\}\_\{p\}^\{20\}pressure models, the training times are higher because low\-dimensional SVDs are performed for each of 50 time steps\. The KL\-DNN model RMSEs reported in the “KL\-DNN RMSE” column are a combination of the KL approximation RMSE and the DNN mapping error\. The approximation and operator learning RMSEs in𝒢p80\\mathcal\{G\}\_\{p\}^\{80\}and𝒢S80\\mathcal\{G\}\_\{S\}^\{80\}models are larger than in the𝒢p20\\mathcal\{G\}\_\{p\}^\{20\}and𝒢S20\\mathcal\{G\}\_\{S\}^\{20\}models despite the fact that the training dataset for𝒢p80\\mathcal\{G\}\_\{p\}^\{80\}and𝒢S80\\mathcal\{G\}\_\{S\}^\{80\}models is four times larger\. This is because the pressure and saturation fields in the model without modifiers are smoother and can be represented more accurately with the same or fewer latent variables than those in the model with modifiers as evident by the KLE approximation errors in the “KLE RMSE” column\.

The RMSEs reported in Table[2](https://arxiv.org/html/2606.28519#S3.T2)are obtained by replacing𝑾\\bm\{W\}with the identity matrix as in the standard PCA expansions, because we find that the resulting KLEs have lower approximation errors\.

Figure[4](https://arxiv.org/html/2606.28519#S3.F4)reports[RMSE](https://arxiv.org/html/2606.28519#id10.10.id10)and MAE values in the pressure predictions for the 10 test cases, where the first eight cases are used for testing𝒢p80\\mathcal\{G\}\_\{p\}^\{80\}and the last two cases are for testing𝒢p20\\mathcal\{G\}\_\{p\}^\{20\}\.

RMSEs range between 0\.9 psi and 1\.4 psi across all test cases, with the average RMSE over the 10 test cases of 1\.1 psi\. The average pressure in the simulations is 3150 psi, yielding the relative average RMSE of 0\.04%\.

![Refer to caption](https://arxiv.org/html/2606.28519v1/errors_pressure_test.png)Figure 4:RMSE and MAE of the fluid pressure predictions for 10 test casesFigure[5](https://arxiv.org/html/2606.28519#S3.F5)shows cross\-sections of reference and predicted pressure fields as well as point errors in the predicted pressure field at the end of injection \(36 months\) for one of the test cases\. The pressure field is relatively smooth, except for a small area near the injection well, where we observe the largest point errors\. In some test cases, we also observe elevated point errors close to the boundaries of the spatial domain\.

![Refer to caption](https://arxiv.org/html/2606.28519v1/p_comparison_field_realization40_indt_36.png)Figure 5:Pressure reference value, prediction, and point error slices across the injection well for the test case 40 at the end of injection \(36 months\)\. The model is trained on the dataset with permeability modifiers\.Figure[6](https://arxiv.org/html/2606.28519#S3.F6)depictsℓ∞\\ell\_\{\\infty\}errors, the maximum absolute errors, in pressure predictions as functions of time for the 10 test cases\. The errors reach their maximum values during the first five months after injection begins, when the largest pressure changes occur\. The fact that the errors decrease after the initial injection phase and do not increase further over time indicates that the model provides relatively accurate long\-term pressure predictions\.

![Refer to caption](https://arxiv.org/html/2606.28519v1/errors_max_mae_time_pressure_test.png)Figure 6:Maximum Absolute Error as a function of time of the fluid pressure predictions for 10 test cases\.Figure[7](https://arxiv.org/html/2606.28519#S3.F7)displays RMSEs and MAEs in the saturation predictions for the 10 test cases\. We find that the errors are similar across all test cases, with an average RMSE of 0\.015 and an average MAE of 0\.002\.

![Refer to caption](https://arxiv.org/html/2606.28519v1/errors_sat_test.png)Figure 7:RMSE and MAE of saturation predictions for 10 test cases\.Figure[8](https://arxiv.org/html/2606.28519#S3.F8)shows the reference and predicted CO2saturation fields for one of the test problems\. The largest point errors are observed at the CO2plume boundaries\.

![Refer to caption](https://arxiv.org/html/2606.28519v1/s_comparison_field_realization40_indt_-1.png)Figure 8:C​O2CO\_\{2\}saturation: cross\-sections showing reference values, predictions, and point errors for test case 40\. The model is trained on the dataset with permeability modifiers\.Figure[9](https://arxiv.org/html/2606.28519#S3.F9)displaysℓ∞\\ell\_\{\\infty\}errors in the saturation predictions as functions of time for the 10 test cases\. We find that the errors increase with time\. Nonetheless, the operator learning model preserves the overall plume shape and provides accurate saturation predictions within the plume\.

![Refer to caption](https://arxiv.org/html/2606.28519v1/errors_max_mae_time_sat_test.png)Figure 9:Maximum absolute error in saturation predictions as functions of time for 10 test cases\.![Refer to caption](https://arxiv.org/html/2606.28519v1/saturation_errors.png)Figure 10:KLE RMSE \(KLE approximation error\) and KL\-DNN RMSE \(KL\-DNN model error\) as functions ofNηN\_\{\\eta\}\. RMSEs are averaged over the 10 test cases\.The average RMSE in the KL\-DNNSSpredictions over all 10 test cases is 0\.0146 or 5% relative to the average saturation in the plume, which is larger than the error in the pressure prediction \(0\.04 %\)\. The average RMSE and relative errors in KL\-DNNppandSSmodels are summarized in Table[3](https://arxiv.org/html/2606.28519#S3.T3)\. For comparison, Table[3](https://arxiv.org/html/2606.28519#S3.T3)also lists average RMSEs in the DeepONet model predictions as reported in\[[10](https://arxiv.org/html/2606.28519#bib.bib32)\]\. The DeepONet model was trained on the same training dataset and evaluated on the same testing dataset as the KL\-DNN model in this study\. We note that the KL\-DNN RMSE errors reported in Table[3](https://arxiv.org/html/2606.28519#S3.T3)are different from those in Table[2](https://arxiv.org/html/2606.28519#S3.T2), which were computed separately for the input parameters with and without modifiers\. We find that RMSEs in KL\-DNN are lower than those in DeepONet for both pressure and saturation\. The average saturation RMSE in our model is 0\.015, which is slightly smaller than the RMSE of 0\.017 reported in\[[10](https://arxiv.org/html/2606.28519#bib.bib32)\]\. The average pressure RMSE of 1\.1 psi in our model is approximately 20% smaller than that in\[[10](https://arxiv.org/html/2606.28519#bib.bib32)\]\. Moreover, the training time of the DeepONet model in\[[10](https://arxiv.org/html/2606.28519#bib.bib32)\]\(2700 minutes\) is about 2 orders of magnitude larger than that of our model\. We note that the DeepONet model in\[[10](https://arxiv.org/html/2606.28519#bib.bib32)\]employed subsampling for training on the IBDP dataset\.

Table 3:RMSE and training time of the KL\-DNN and DeepONet pressure and saturation predictions averaged over 10 test cases\. The DeepONet results are reported based on\[[10](https://arxiv.org/html/2606.28519#bib.bib32)\]\.ModelTraining time \(min\)Testing Cases RMSE𝒢p\\mathcal\{G\}\_\{p\}201\.10 psi𝒢S\\mathcal\{G\}\_\{S\}60\.0146Pressure DeepONet\[[10](https://arxiv.org/html/2606.28519#bib.bib32)\]27001\.35 psiSaturation DeepONet\[[10](https://arxiv.org/html/2606.28519#bib.bib32)\]27000\.0157Finally, we comment on the approximation and model RMSEs in pressure and saturation reported in Table[2](https://arxiv.org/html/2606.28519#S3.T2)\. As noted earlier, the model RMSE is a combination of the KLE approximation error and the error due to mapping of𝝃\\bm\{\\xi\}to𝜼\\bm\{\\eta\}\. In Table[2](https://arxiv.org/html/2606.28519#S3.T2), the model RMSEs are dominated by the KLE approximation errors\. The approximation errors can be reduced by including more terms in the KLEs ofppandSS\. In Figure[10](https://arxiv.org/html/2606.28519#S3.F10), we demonstrate that the RMSE in the KLE approximation ofSS, averaged over the 10 test cases, decreases as a function ofNηN\_\{\\eta\}, the number of terms in the KLE expansion of the saturation field\. Figure[10](https://arxiv.org/html/2606.28519#S3.F10)also shows the average RMSE of the KL\-DNN predictions ofSS, which reaches a minimum forNη=6N\_\{\\eta\}=6\. As we can see, forNη\>6N\_\{\\eta\}\>6, the approximation error decreases, but the model error increases with increasingNηN\_\{\\eta\}\. This is because forNη\>6N\_\{\\eta\}\>6, there are not enough samples in the training dataset to train the DNN, whose size increases withNηN\_\{\\eta\}\.

## 4 Conclusions

We proposed a scalable operator learning model and applied it to the large\-scale IBDP[GCS](https://arxiv.org/html/2606.28519#id2.2.id2)problem\. Our model offers faster training and higher accuracy than the DeepONet model trained on the same dataset\.

Like the DeepONet operator\-learning model, the proposed model performs operator learning in the latent spaces of the system’s states and parameters\. Unlike the DeepONet, our model trains the parameter encoder \(equivalent to the leading part of the DeepONet branch\), the DNN map \(equivalent to the back end of the DeepONet branch\), and the decoder \(equivalent to the trunk of the DeepONet branch\) separately\. This trainable\-by\-part feature of our model, in combination with a low\-dimensional KL representation of high\-dimensional states and low\-rank SVD, enables it to scale to large\-dimensional problems\. DeepONet and other operator\-learning models often use subsampling for large\-scale problems, which can introduce additional errors\.

The training time for our model, which includes both SVD and DNN training, is approximately 20 minutes\. This is two orders of magnitude less than the training time required for the DeepONet model\. Our model has average pressure and saturation RMSEs of 1\.1 psi and 0\.0146, respectively, which are lower than the DeepONet model’s errors \(1\.35 psi and 0\.0157, respectively\)\. Point errors in the pressure predictions are higher near the injection well and the domain boundaries\. Saturation point errors are highest near the edge of the plume\.

Our results demonstrate the utility of the operator learning models for practical large\-scale applications such as GCS\. The inference time of less than 1 minute per simulation makes it well\-suited for uncertainty quantification, history matching, and other tasks that require multiple model evaluations\. The training time of less than 20 minutes on a regular workstation makes the model accessible to most users\.

The proposed trainable\-by\-part approach and low\-dimensional KL representation can be used for scaling other operator learning models, including PCA\-Net and DeepONet, to large\-scale problems\.

## Conflict of Interest disclosure

The authors declare there are no conflicts of interest for this manuscript\.

## Acknowledgments

This research was partially supported by the Strategic Research Initiative \(SRI\) Program at the Grainger College of Engineering of the University of Illinois Urbana \- Champaign\.

## References

- \[1\]\(2024\-11\)S&TR december 2024 el capitan issue\.Technical reportLawrence Livermore National Laboratory \(LLNL\), Livermore, CA \(United States\)\.Note:At Lawrence Livermore National Laboratory, we focus on science and technology research to ensure our nation’s security\. We also apply that expertise to solve other important national problems in energy, bioscience, and the environment\. Science amp; Technology Review is published eight times a year to communicate, to a broad audience, the Laboratory’s scientific and technological accomplishments in fulfilling its primary missions\. The publication’s goal is to help readers understand these accomplishments and appreciate their value to the individual citizen, the nation, and the world\.External Links:[Document](https://dx.doi.org/10.2172/2536826),[Link](https://www.osti.gov/biblio/2536826)Cited by:[§3](https://arxiv.org/html/2606.28519#S3.p6.6)\.
- \[2\]A\. Alwosheel, S\. Van Cranenburgh, and C\. G\. Chorus\(2018\)Is your dataset big enough? sample size requirements when using artificial neural networks for discrete choice analysis\.Journal of choice modelling28,pp\. 167–182\.Cited by:[§2](https://arxiv.org/html/2606.28519#S2.p16.15)\.
- \[3\]K\. Bhattacharya, B\. Hosseini, N\. B\. Kovachki, and A\. M\. Stuart\(2021\)Model reduction and neural networks for parametric pdes\.The SMAI journal of computational mathematics7,pp\. 121–157\.Cited by:[§1](https://arxiv.org/html/2606.28519#S1.p4.1)\.
- \[4\]M\. Brand\(2006\)Fast low\-rank modifications of the thin singular value decomposition\.Linear algebra and its applications415\(1\),pp\. 20–30\.Cited by:[§2](https://arxiv.org/html/2606.28519#S2.p10.6)\.
- \[5\]H\. Cho, D\. Venturi, and G\.E\. Karniadakis\(2013\)Karhunen–loève expansion for multi\-correlated stochastic processes\.34,pp\. 157–167\.External Links:ISSN 0266\-8920,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.probengmech.2013.09.004),[Link](https://www.sciencedirect.com/science/article/pii/S0266892013000702)Cited by:[§2](https://arxiv.org/html/2606.28519#S2.p21.3)\.
- \[6\]M\. Dhingra, R\. Maulik, A\. Rasheed, and O\. San\(2025\)Localized pca\-net neural operators for scalable solution reconstruction of elliptic pdes\.arXiv preprint arXiv:2509\.18110\.Cited by:[§1](https://arxiv.org/html/2606.28519#S1.p7.1)\.
- \[7\]R\. J\. Finley\(2014\)An overview of the illinois basin – decatur project\.4\(5\),pp\. 571–579\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1002/ghg.1433),[Link](https://scijournals.onlinelibrary.wiley.com/doi/abs/10.1002/ghg.1433),https://scijournals\.onlinelibrary\.wiley\.com/doi/pdf/10\.1002/ghg\.1433Cited by:[Figure 2](https://arxiv.org/html/2606.28519#S3.F2),[§3](https://arxiv.org/html/2606.28519#S3.p1.3),[§3](https://arxiv.org/html/2606.28519#S3.p2.5)\.
- \[8\]J\. J\. Gerbrands\(1981\)On the relationships between svd, klt and pca\.14\(1\-6\),pp\. 375–381\.Cited by:[§1](https://arxiv.org/html/2606.28519#S1.p4.1)\.
- \[9\]T\. Hermans, P\. Goderniaux, D\. Jougnot, J\. H\. Fleckenstein, P\. Brunner, F\. Nguyen, N\. Linde, J\. A\. Huisman, O\. Bour, J\. Lopez Alvis, R\. Hoffmann, A\. Palacios, A\.\-K\. Cooke, Á\. Pardo\-Álvarez, L\. Blazevic, B\. Pouladi, P\. Haruzi, A\. Fernandez Visentini, G\. E\. H\. Nogueira, J\. Tirado\-Conde, M\. C\. Looms, M\. Kenshilikova, P\. Davy, and T\. Le Borgne\(2023\)Advancing measurements and representations of subsurface heterogeneity and dynamic processes: towards 4d hydrogeology\.27\(1\),pp\. 255–287\.External Links:[Link](https://hess.copernicus.org/articles/27/255/2023/),[Document](https://dx.doi.org/10.5194/hess-27-255-2023)Cited by:[§1](https://arxiv.org/html/2606.28519#S1.p2.1)\.
- \[10\]T\. Kadeethum, S\.J\. Verzi, and H\. Yoon\(2024\)An improved neural operator framework for large\-scale co2 storage operations\.240,pp\. 213007\.External Links:ISSN 2949\-8910,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.geoen.2024.213007),[Link](https://www.sciencedirect.com/science/article/pii/S2949891024003774)Cited by:[§1](https://arxiv.org/html/2606.28519#S1.p10.1),[§1](https://arxiv.org/html/2606.28519#S1.p3.2),[§1](https://arxiv.org/html/2606.28519#S1.p6.1),[Table 3](https://arxiv.org/html/2606.28519#S3.T3),[Table 3](https://arxiv.org/html/2606.28519#S3.T3.2.2.4.1),[Table 3](https://arxiv.org/html/2606.28519#S3.T3.2.2.5.1),[§3](https://arxiv.org/html/2606.28519#S3.p18.3),[§3](https://arxiv.org/html/2606.28519#S3.p2.5)\.
- \[11\]J\. Mercer\(1909\-01\)XVI\. functions of positive and negative type, and their connection the theory of integral equations\.Philosophical Transactions of the Royal Society of London, Series A: Containing Papers of a Mathematical or Physical Character209\(441\-458\),pp\. 415–446\.External Links:ISSN 0264\-3952,[Document](https://dx.doi.org/10.1098/rsta.1909.0016),[Link](https://doi.org/10.1098/rsta.1909.0016),https://royalsocietypublishing\.org/rsta/article\-pdf/209/441\-458/415/239856/rsta\.1909\.0016\.pdfCited by:[§2](https://arxiv.org/html/2606.28519#S2.p4.27)\.
- \[12\]S\. Mo, Y\. Zhu, N\. Zabaras, X\. Shi, and J\. Wu\(2019\)Deep convolutional encoder\-decoder networks for uncertainty quantification of dynamic multiphase flow in heterogeneous media\.55\(1\),pp\. 703–728\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1029/2018WR023528),[Link](https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2018WR023528),https://agupubs\.onlinelibrary\.wiley\.com/doi/pdf/10\.1029/2018WR023528Cited by:[§1](https://arxiv.org/html/2606.28519#S1.p3.2)\.
- \[13\]D\. Nayak and S\. Goswami\(2025\)TI\-deeponet: learnable time integration for stable long\-term extrapolation\.arXiv preprint arXiv:2505\.17341\.Cited by:[§1](https://arxiv.org/html/2606.28519#S1.p6.1)\.
- \[14\]Ø\. Pettersen\(2006\)Basics of reservoir simulation with the eclipse reservoir simulator\.114\.Cited by:[§3](https://arxiv.org/html/2606.28519#S3.p2.5)\.
- \[15\]M\. M\. Reyna and A\. M\. Tartakovsky\(2025\)Parameter estimation in river transport models with immobile phase exchange using dimensional analysis and reduced\-order models\.External Links:2510\.19664,[Link](https://arxiv.org/abs/2510.19664)Cited by:[§2](https://arxiv.org/html/2606.28519#S2.p6.12)\.
- \[16\]C\. Safta and H\. N\. Najm\(2026\)Numerical considerations for the construction of karhunen\-lo\\\\backslash\{\\\{e\}\\\}ve expansions\.arXiv preprint arXiv:2603\.19108\.Cited by:[§2](https://arxiv.org/html/2606.28519#S2.p8.10)\.
- \[17\]R\. Sharma and V\. Shankar\(2025\)Ensemble and mixture\-of\-experts deeponets for operator learning\.External Links:2405\.11907,[Link](https://arxiv.org/abs/2405.11907)Cited by:[§1](https://arxiv.org/html/2606.28519#S1.p6.1)\.
- \[18\]E\. A\. Sudicky, W\. A\. Illman, I\. K\. Goltz, J\. J\. Adams, and R\. G\. McLaren\(2010\)Heterogeneity in hydraulic conductivity and its role on the macroscale transport of a solute plume: from measurements to a practical application of stochastic flow and transport theory\.46\(1\),pp\.\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1029/2008WR007558),[Link](https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2008WR007558),https://agupubs\.onlinelibrary\.wiley\.com/doi/pdf/10\.1029/2008WR007558Cited by:[§1](https://arxiv.org/html/2606.28519#S1.p2.1)\.
- \[19\]H\. Tang, Q\. Kong, and J\. P\. Morris\(2024\)Multi\-fidelity fourier neural operator for fast modeling of large\-scale geological carbon storage\.629,pp\. 130641\.External Links:ISSN 0022\-1694,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.jhydrol.2024.130641),[Link](https://www.sciencedirect.com/science/article/pii/S0022169424000350)Cited by:[§1](https://arxiv.org/html/2606.28519#S1.p3.2)\.
- \[20\]A\.M\. Tartakovsky, D\.A\. Barajas\-Solano, and Q\. He\(2021\)Physics\-informed machine learning with conditional Karhunen\-Loève expansions\.Journal of Computational Physics426,pp\. 109904\.Cited by:[§2](https://arxiv.org/html/2606.28519#S2.p7.9)\.
- \[21\]A\. M\. Tartakovsky and Y\. Zong\(2024\)Physics\-informed machine learning method with space\-time karhunen\-loève expansions for forward and inverse partial differential equations\.499,pp\. 112723\.External Links:ISSN 0021\-9991,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.jcp.2023.112723),[Link](https://www.sciencedirect.com/science/article/pii/S0021999123008185)Cited by:[§2](https://arxiv.org/html/2606.28519#S2.p4.13)\.
- \[22\]D\. M\. Tartakovsky, Z\. Lu, A\. Guadagnini, and A\. M\. Tartakovsky\(2003\)Unsaturated flow in heterogeneous soils with spatially distributed uncertain hydraulic parameters\.Journal of Hydrology275\(3\),pp\. 182–193\.Note:Studies on Water Movement and Solute Transport in Arid RegionsExternal Links:ISSN 0022\-1694,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/S0022-1694%2803%2900042-8),[Link](https://www.sciencedirect.com/science/article/pii/S0022169403000428)Cited by:[§2](https://arxiv.org/html/2606.28519#S2.p7.9)\.
- \[23\]Y\. Wang, Y\. Zong, J\. L\. McCreight, J\. D\. Hughes, M\. Fienen, and A\. M\. Tartakovsky\(2025\)Karhunen–loève deep learning method for surrogate modeling and approximate bayesian parameter estimation\.203,pp\. 105024\.External Links:ISSN 0309\-1708,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.advwatres.2025.105024),[Link](https://www.sciencedirect.com/science/article/pii/S0309170825001381)Cited by:[§1](https://arxiv.org/html/2606.28519#S1.p4.1)\.
- \[24\]Y\. Wang, Y\. Zong, J\. L\. McCreight, J\. D\. Hughes, and A\. M\. Tartakovsky\(2024\)Bayesian reduced\-order deep learning surrogate model for dynamic systems described by partial differential equations\.429,pp\. 117147\.External Links:ISSN 0045\-7825,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.cma.2024.117147),[Link](https://www.sciencedirect.com/science/article/pii/S0045782524004031)Cited by:[§1](https://arxiv.org/html/2606.28519#S1.p4.1)\.
- \[25\]A\. D\. Woodbury and T\. J\. Ulrych\(2000\)A full\-bayesian approach to the groundwater inverse problem for steady state flow\.36\(8\),pp\. 2081–2093\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1029/2000WR900086),[Link](https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2000WR900086),https://agupubs\.onlinelibrary\.wiley\.com/doi/pdf/10\.1029/2000WR900086Cited by:[§1](https://arxiv.org/html/2606.28519#S1.p2.1)\.
- \[26\]D\. Xiu and G\. E\. Karniadakis\(2003\)Modeling uncertainty in flow simulations via generalized polynomial chaos\.Journal of computational physics187\(1\),pp\. 137–167\.Cited by:[§2](https://arxiv.org/html/2606.28519#S2.p7.9)\.
- \[27\]B\. Yang, X\. Li, J\. Zhao, and Y\. Jiang\(2025\)DD\-deeponet: domain decomposition and deeponet for solving partial differential equations in three application scenarios\.arXiv preprint arXiv:2508\.02717\.Cited by:[§1](https://arxiv.org/html/2606.28519#S1.p6.1)\.
- \[28\]X\. Yu, S\. Hooten, Z\. Liu, Y\. Zhao, M\. Fiorentino, T\. Van Vaerenbergh, and Z\. Zhang\(2024\)Separable operator networks\.arXiv preprint arXiv:2407\.11253\.Cited by:[§1](https://arxiv.org/html/2606.28519#S1.p6.1)\.
- \[29\]Z\. Zheng and H\. Dai\(2017\)Simulation of multi\-dimensional random fields by karhunen–loève expansion\.324,pp\. 221–247\.External Links:ISSN 0045\-7825,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.cma.2017.05.022),[Link](https://www.sciencedirect.com/science/article/pii/S0045782516318692)Cited by:[§2](https://arxiv.org/html/2606.28519#S2.p11.2)\.

Similar Articles

Operator Boosting Produces Pareto-Efficient PDE Surrogates

arXiv cs.LG

Operator Boosting is a stagewise residual-learning framework that constructs compact neural operator surrogates for PDEs by training tiny models on residual fields. It achieves accuracy comparable to or better than full-size models while reducing parameters by up to 95%, demonstrating Pareto improvements on several benchmarks.