Operator Learning for Cubic Nonlinear Schr\"odinger Equation on Periodic Domains
Summary
This paper presents a geometry-conditioned Fourier Neural Operator (FNO) to learn the solution operator for the cubic nonlinear Schrödinger equation on periodic domains with varying aspect ratios. Numerical experiments show the model captures distinct Sobolev norm behaviors on rational and irrational tori, demonstrating geometry-aware neural operators for dispersive PDEs.
View Cached Full Text
Cached at: 06/29/26, 05:23 AM
# Operator Learning for Cubic Nonlinear Schrödinger Equation on Periodic Domains Source: [https://arxiv.org/html/2606.27459](https://arxiv.org/html/2606.27459) Emmanuel E\. Oguadimma, Victory C\. Obieke, and Xueying YuEmmanuel E\. Oguadimma Department of Mathematics, Oregon State University Kidder Hall 368 Corvallis, OR 97331[oguadime@oregonstate\.edu](https://arxiv.org/html/2606.27459v1/mailto:[email protected])Victory C\. Obieke Department of Mathematics, Oregon State University Kidder Hall 368 Corvallis, OR 97331[obiekev@oregonstate\.edu](https://arxiv.org/html/2606.27459v1/mailto:[email protected])Xueying Yu Department of Mathematics, Oregon State University Kidder Hall 368 Corvallis, OR 97331[xueying\.yu@oregonstate\.edu](https://arxiv.org/html/2606.27459v1/mailto:[email protected]) ###### Abstract\. We consider the cubic nonlinear Schrödinger \(NLS\) equation on two\-dimensional flat tori with varying aspect ratios\. In this formulation, the choice of aspect ratio governs the Fourier resonance structure, so rational and irrational geometries can exhibit different high\-frequency cascade behaviors\. We present a geometry\-conditioned Fourier neural operator \(FNO\) for the cubic defocusing NLS equation, where the input consists of the real and imaginary parts of the solution together with the aspect\-ratio parameterω2\\omega^\{2\}\. The model is trained to approximate the one\-step solution operator and is evaluated on unseen trajectories generated from random\-phase initial data using Fourier pseudospectral method\. Our numerical experiments show that the learned operator captures the main solution dynamics on both tori and reproduces the distinct Sobolev norm behavior of the two geometries, with strongerH2H^\{2\}\-growth on the rational torus and more constrained behavior on the irrational torus, consistent with the findings of[hrabski2021energy](https://arxiv.org/html/2606.27459#bib.bib13)\. We perform ablation studies to examine the roles of retained Fourier modes, activation functions, Fourier\-layer depth, and explicit geometry conditioning\. The results indicate that includingω2\\omega^\{2\}improves long\-time predictive accuracy, especially for the rational geometry, and supports the use of geometry\-aware neural operators for learning spectral\-transfer phenomena in nonlinear dispersive partial differential equations\. ###### Key words and phrases: Weak Turbulence, NLS, Operator Learning ###### Contents 1. [1Introduction](https://arxiv.org/html/2606.27459#S1) 2. [2Preliminaries](https://arxiv.org/html/2606.27459#S2) 3. [3Operator Learning](https://arxiv.org/html/2606.27459#S3) 4. [4Numerical Experiments](https://arxiv.org/html/2606.27459#S4) 5. [5Conclusion](https://arxiv.org/html/2606.27459#S5) 6. [References](https://arxiv.org/html/2606.27459#bib) ## 1\.Introduction The nonlinear Schrödinger \(NLS\) equation is a central dispersive partial differential equation \(PDE\) in mathematical physics\. It arises in nonlinear optics and plasma physics, including the self\-focusing and collapse of intense beams in Kerr media\([NewellMoloney1992,](https://arxiv.org/html/2606.27459#bib.bib22);[SulemSulem1999,](https://arxiv.org/html/2606.27459#bib.bib32)\), in the modulation of deep\-water waves\([Zakharov1968,](https://arxiv.org/html/2606.27459#bib.bib39)\), and in the Gross–Pitaevskii description of Bose–Einstein condensates at ultra\-low temperatures\([bao2007nonlinear,](https://arxiv.org/html/2606.27459#bib.bib1)\)\. Beyond its role as a canonical nonlinear wave equation, the cubic NLS also serves as a fundamental model for studying long\-time energy redistribution across Fourier modes in various geometries\. In recent years, this problem has been investigated under the broader perspective of weak turbulence theory, where one seeks to understand how energy migrates from low to high frequencies[Bourgain1996](https://arxiv.org/html/2606.27459#bib.bib3);[hrabski2021energy](https://arxiv.org/html/2606.27459#bib.bib13)\. In this formulation, the geometry of the underlying domain directly shapes the resonance structure and therefore the efficiency of the cascade\. Recent developments on energy transfer for the cubic defocusing NLS on irrational tori[hrabski2021energy](https://arxiv.org/html/2606.27459#bib.bib13), has analytically shown that irrational aspect ratios substantially weaken the transfer of energy to high frequencies compared with the rational case\. The analysis proves improved upper bounds on Sobolev norm growth for generic irrational tori, establishes barriers to the propagation of Fourier support for initially frequency\-localized data, and studies the associated quasi\-resonant system\. Just as importantly for applications, the numerical study demonstrates that the energy cascade is consistently slower on irrational tori than on rational tori, and explains this difference through the geometry of quasi\-resonant sets\. This has a concrete implication beyond pure analysis\. It places a serious caveat on periodic\-domain simulations used in wave turbulence, because two simulations with different domain aspect ratios may not represent the same small\-scale cascade physics, even when the resolution is very high\. In other words, the interaction between dispersion relation and domain geometry must be accounted for when periodic\-box computations are used as surrogates for homogeneous turbulence on large or effectively unbounded domains\. Classical numerical methods are standard approaches for solving the NLS equation and its variants\. Split\-step Fourier, time\-splitting spectral, and pseudo\-spectral techniques can provide high accuracy and are especially effective for dispersive wave propagation[weideman1986splitstep](https://arxiv.org/html/2606.27459#bib.bib33);[bao2003timesplitting](https://arxiv.org/html/2606.27459#bib.bib2)\. However, repeated simulations over many initial conditions, parameter values, or geometries can become computationally expensive, and their accuracy can depend on the structure of the time\-integration method and its treatment of symmetries and invariants[oguadimma2026foundational](https://arxiv.org/html/2606.27459#bib.bib25)\. This limitation has motivated growing interest in scientific machine learning methods that learn the behavior of the solution directly from data\. In general, one can separate deep\-learning approaches to PDEs into two classes\. One class includes instance\-based solvers such as physics\-informed neural networks \(PINNs\)[Raissi2019PINNs](https://arxiv.org/html/2606.27459#bib.bib28)and related methods, including the Deep Ritz method[Yu2018DeepRitz](https://arxiv.org/html/2606.27459#bib.bib38), the Deep Galerkin Method[SirignanoSpiliopoulos2018](https://arxiv.org/html/2606.27459#bib.bib30), weak adversarial networks[ZangBaoYeZhou2020](https://arxiv.org/html/2606.27459#bib.bib40), and domain\-decomposition extensions such as cPINNs[JagtapKharazmiKarniadakis2020](https://arxiv.org/html/2606.27459#bib.bib15)and XPINNs[JagtapKarniadakis2020](https://arxiv.org/html/2606.27459#bib.bib14), which are often accurate but typically require retraining when conditions such as parameters change\. Structure\-preserving learning approaches have also shown that the incorporation of geometric and physical constraints can improve long\-time fidelity and reduce drift in learned dynamical systems[zhong2020symoden](https://arxiv.org/html/2606.27459#bib.bib42);[obieke2025structure](https://arxiv.org/html/2606.27459#bib.bib23);[jin2020sympnets](https://arxiv.org/html/2606.27459#bib.bib16);[cranmer2020lagrangian](https://arxiv.org/html/2606.27459#bib.bib5);[obieke2026structure](https://arxiv.org/html/2606.27459#bib.bib24);[greydanus2019hamiltonian](https://arxiv.org/html/2606.27459#bib.bib9);[hernandez2021structure](https://arxiv.org/html/2606.27459#bib.bib11)\. The other class includes operator\-learning approaches, which aim to learn mappings between infinite\-dimensional function spaces and therefore can generalize over families of PDE instances[Lu2021DeepONet](https://arxiv.org/html/2606.27459#bib.bib21);[Li2021FNO](https://arxiv.org/html/2606.27459#bib.bib20);[Kovachki2021FNO](https://arxiv.org/html/2606.27459#bib.bib17)\. Among these approaches, the Fourier neural operator \(FNO\) has emerged as a particularly influential architecture because it parameterizes integral kernels in Fourier space, efficiently captures multiscale structure, and is well suited for PDEs whose dynamics are naturally organized by frequency interactions[Li2021FNO](https://arxiv.org/html/2606.27459#bib.bib20);[Kovachki2021FNO](https://arxiv.org/html/2606.27459#bib.bib17)\. Its frequency\-space formulation has supported applications in fluid–structure interaction[xiao2024fourier](https://arxiv.org/html/2606.27459#bib.bib36), multiphase flow[wen2022u](https://arxiv.org/html/2606.27459#bib.bib34), and heterogeneous material modeling[You2022IFNO](https://arxiv.org/html/2606.27459#bib.bib37)\. Existing work has shown that the FNO algorithm can effectively capture the dynamics of nonlinear waves governed by Schrödinger equations\. In particular, it has been used to investigate parametric soliton\-state transitions in the NLS, Hirota, and𝒫𝒯\\mathcal\{PT\}\-symmetric NLS equations, showing that a single trained network can learn families of complex nonlinear waves across parameter ranges[ZhongYanTian2023](https://arxiv.org/html/2606.27459#bib.bib41)\. It has also been extended to coupled NLS systems, where the interaction between multiple components, multiple physical effects, and parameter\-dependent localized waves makes the dynamics even more intricate[RenTian2026](https://arxiv.org/html/2606.27459#bib.bib29)\. The present work is based on two observations\. On the one hand, the irrational\-torus NLS problem is scientifically rich, since recent mathematical analysis have revealed clear and practically significant differences between rational and irrational geometries[hrabski2021energy](https://arxiv.org/html/2606.27459#bib.bib13);[deng2019growth](https://arxiv.org/html/2606.27459#bib.bib6);[planchon2017growth](https://arxiv.org/html/2606.27459#bib.bib27)\. On the other hand, a handful of works in the literature suggest that operator\-learning architectures are especially effective when the essential dynamics are encoded in frequency\-space interactions and one seeks to learn an entire family of solutions[Li2021FNO](https://arxiv.org/html/2606.27459#bib.bib20);[ZhongYanTian2023](https://arxiv.org/html/2606.27459#bib.bib41);[RenTian2026](https://arxiv.org/html/2606.27459#bib.bib29)\. In this paper, we present a geometry\-conditioned Fourier neural operator for the two\-dimensional cubic defocusing NLS on rational and irrational tori\. In contrast to previous studies that focus primarily on parametric families of explicit soliton or rogue\-wave solutions[ZhongYanTian2023](https://arxiv.org/html/2606.27459#bib.bib41);[RenTian2026](https://arxiv.org/html/2606.27459#bib.bib29), our focus is on how the torus aspect ratio changes the resonance structure of the Fourier lattice and thereby alters the observed energy\-transfer dynamics\. Our formulation follows the time\-marching perspective of prior FNO work[Li2021FNO](https://arxiv.org/html/2606.27459#bib.bib20), but incorporates the aspect\-ratio parameter of the torus as an additional input channel so that a single learned operator can distinguish between the rational and irrational geometries\. The model learns a one\-step solution operator from the current complex\-valued state to the next state, and long\-time predictions are obtained autoregressively by feeding each prediction back into the network\. This setup is natural for the present problem because it allows the learned operator to propagate geometry\-dependent dynamics over long time intervals\. Our numerical setup follows the scientific framework of[hrabski2021energy](https://arxiv.org/html/2606.27459#bib.bib13)\. We consider random\-phase initial data with compact Fourier support and compare the dynamics on the rational torus and the irrational torus, and generate reference solutions using the Fourier pseudo\-spectral method with fourth\-order Runge\-Kutta time stepping\. What remains of the paper is organized as follows\. Section[2](https://arxiv.org/html/2606.27459#S2)presents the theoretical background and introduces the NLS model on rational and irrational tori\. Section[3](https://arxiv.org/html/2606.27459#S3)describes the operator\-learning framework considered in this work, which includes the Fourier Neural Operator\. Section[4](https://arxiv.org/html/2606.27459#S4)presents numerical experiments, including data generation, computational setup, training, evaluation and ablation study\. ### Acknowledgment We would like to thank Alexander Hrabski for helpful discussions\. X\.Y\. is partially supported by NSF DMS\-2306429\. ## 2\.Preliminaries In this section, we introduce the two\-dimensional cubic nonlinear Schrödinger equation on flat tori, define the rational and irrational geometries considered here, and establish the notation used throughout\. We also briefly recall the analytical results most relevant to our numerical study\. For further details, including rigorous statements and proofs, we refer the reader to[hrabski2021energy](https://arxiv.org/html/2606.27459#bib.bib13);[staffilani2020stability](https://arxiv.org/html/2606.27459#bib.bib31);[colliander2010transfer](https://arxiv.org/html/2606.27459#bib.bib4);[giuliani2022sobolev](https://arxiv.org/html/2606.27459#bib.bib7)\. ### 2\.1\.Fourier Transforms on Tori In[hrabski2021energy](https://arxiv.org/html/2606.27459#bib.bib13), the authors consider the scaled torus \(2\.1\)𝕋ω¯2=ℝ/ω1ℤ×ℝ/ω2ℤ,ω¯=\(ω1,ω2\)∈ℝ\+2,\\displaystyle\\mathbb\{T\}^\{2\}\_\{\\underline\{\\omega\}\}=\\mathbb\{R\}/\\omega\_\{1\}\\mathbb\{Z\}\\times\\mathbb\{R\}/\\omega\_\{2\}\\mathbb\{Z\},\\quad\\underline\{\\omega\}=\(\\omega\_\{1\},\\omega\_\{2\}\)\\in\\mathbb\{R\}\_\{\+\}^\{2\},whereω1\\omega\_\{1\}andω2\\omega\_\{2\}denote the periods in the two spatial directions and hence determine the geometry\. The domain is called*rational*ifω12/ω22∈ℚ\\omega\_\{1\}^\{2\}/\\omega\_\{2\}^\{2\}\\in\\mathbb\{Q\}and*irrational*otherwise; the two cases correspond to commensurate and incommensurate squared side lengths\. Without loss of generality one reduces toω¯=\(1,ω\)\\underline\{\\omega\}=\(1,\\omega\), so that \(2\.2\)𝕋ω¯2:=ℝ/ℤ×ℝ/ωℤ,\\mathbb\{T\}^\{2\}\_\{\\underline\{\\omega\}\}:=\\mathbb\{R\}/\\mathbb\{Z\}\\times\\mathbb\{R\}/\\omega\\mathbb\{Z\},and𝕋ω¯2\\mathbb\{T\}^\{2\}\_\{\\underline\{\\omega\}\}is*irrational*ifω2∉ℚ\\omega^\{2\}\\notin\\mathbb\{Q\}and*rational*otherwise\. To best accommodate our simulations, we work with an equivalent2π2\\pi\-periodic formulation in which the geometry enters through an anisotropic Laplacian rather than through the domain\. We pose the problem on the square torus \(2\.3\)𝕋2=ℝ/2πℤ×ℝ/2πℤ,\\displaystyle\\mathbb\{T\}^\{2\}=\\mathbb\{R\}/2\\pi\\mathbb\{Z\}\\times\\mathbb\{R\}/2\\pi\\mathbb\{Z\},and replace the standard Laplacian by the anisotropic operator \(2\.4\)Δω:=∂x2\+ω2∂y2,\\displaystyle\\Delta\_\{\\omega\}:=\\partial\_\{x\}^\{2\}\+\\omega^\{2\}\\partial\_\{y\}^\{2\},writingΔ\\DeltaforΔω\\Delta\_\{\\omega\}when the context is clear\. For everyf∈L2\(𝕋2\)f\\in L^\{2\}\(\\mathbb\{T\}^\{2\}\), we write its Fourier expansion \(2\.5\)f\(x,y\)=∑k=\(m,ℓ\)∈ℤ2f^kei\(mx\+ℓy\),f\(x,y\)=\\sum\_\{k=\(m,\\ell\)\\in\\mathbb\{Z\}^\{2\}\}\\widehat\{f\}\_\{k\}\\,e^\{i\(mx\+\\ell y\)\},with Fourier coefficients \(2\.6\)f^k:=1\(2π\)2∫𝕋2f\(x,y\)e−i\(mx\+ℓy\)dxdy\.k=\(m,ℓ\)∈ℤ2\.\\widehat\{f\}\_\{k\}:=\\frac\{1\}\{\(2\\pi\)^\{2\}\}\\int\_\{\\mathbb\{T\}^\{2\}\}f\(x,y\)\\,e^\{\-i\(mx\+\\ell y\)\}\\,dx\\,dy\.\\qquad k=\(m,\\ell\)\\in\\mathbb\{Z\}^\{2\}\.Parseval’s identity reads \(2\.7\)‖f‖L2\(𝕋2\)2=\(2π\)2∑k∈ℤ2\|f^k\|2\.\\\|f\\\|\_\{L^\{2\}\(\\mathbb\{T\}^\{2\}\)\}^\{2\}=\(2\\pi\)^\{2\}\\sum\_\{k\\in\\mathbb\{Z\}^\{2\}\}\|\\widehat\{f\}\_\{k\}\|^\{2\}\. With this basis the anisotropic Laplacian acts diagonally, \(2\.8\)\(Δωf\)^k=−λkf^k,λk:=m2\+ω2ℓ2,k=\(m,ℓ\)∈ℤ2,\\widehat\{\(\\Delta\_\{\\omega\}f\)\}\_\{k\}=\-\\lambda\_\{k\}\\,\\widehat\{f\}\_\{k\},\\qquad\\lambda\_\{k\}:=m^\{2\}\+\\omega^\{2\}\\ell^\{2\},\\quad k=\(m,\\ell\)\\in\\mathbb\{Z\}^\{2\},so that all geometric information is carried by the eigenvaluesλk\\lambda\_\{k\}\. This formulation is equivalent to \([2\.2](https://arxiv.org/html/2606.27459#S2.E2)\) after rescaling theyy\-variable, and it has the advantage that the rational and irrational cases share the same2π2\\pi\-periodic grid, differing only through the value ofω\\omegainλk\\lambda\_\{k\}\. ### 2\.2\.Function Spaces ###### Definition 2\.1\. Fors∈\[0,∞\)s\\in\[0,\\infty\), define the Sobolev spaceHs\(𝕋2\)H^\{s\}\(\\mathbb\{T\}^\{2\}\)as the closure of smooth functionsf:𝕋2→ℂf:\\mathbb\{T\}^\{2\}\\to\\mathbb\{C\}with Fourier expansion \(2\.9\)f\(x,y\)=∑k=\(m,ℓ\)∈ℤ2f^kei\(mx\+ℓy\)\\displaystyle f\(x,y\)=\\sum\_\{k=\(m,\\ell\)\\in\\mathbb\{Z\}^\{2\}\}\\widehat\{f\}\_\{k\}\\,e^\{i\(mx\+\\ell y\)\}under the norm \(2\.10\)‖f‖Hs\(𝕋2\):=\(∑k=\(m,ℓ\)∈ℤ2⟨k⟩2s\|f^k\|2\)1/2,\\displaystyle\\\|f\\\|\_\{H^\{s\}\(\\mathbb\{T\}^\{2\}\)\}:=\\left\(\\sum\_\{k=\(m,\\ell\)\\in\\mathbb\{Z\}^\{2\}\}\\langle k\\rangle^\{2s\}\|\\widehat\{f\}\_\{k\}\|^\{2\}\\right\)^\{1/2\},where⟨k⟩:=\(1\+\|m\|2\+\|ℓ\|2\)1/2\\langle k\\rangle:=\(1\+\|m\|^\{2\}\+\|\\ell\|^\{2\}\)^\{1/2\}\. ###### Definition 2\.2\. Forx=\{xn\}n∈ℤ2x=\\\{x\_\{n\}\\\}\_\{n\\in\\mathbb\{Z\}^\{2\}\}ands∈\[0,∞\)s\\in\[0,\\infty\), define the*Sobolev norm* \(2\.11\)‖x‖hs:=∑n∈ℤ2\|xn\|2⟨n⟩2s,⟨n⟩:=\|n\|2\+1,\\displaystyle\\\|x\\\|\_\{h^\{s\}\}:=\\sqrt\{\\sum\_\{n\\in\\mathbb\{Z\}^\{2\}\}\|x\_\{n\}\|^\{2\}\\langle n\\rangle^\{2s\}\},\\qquad\\langle n\\rangle:=\\sqrt\{\|n\|^\{2\}\+1\},and the*sequence Sobolev space* \(2\.12\)hs:=hs\(ℤ2\):=\{x=\{xn\}n∈ℤ2:∥x∥s<∞\}\.\\displaystyle h^\{s\}:=h^\{s\}\(\\mathbb\{Z\}^\{2\}\):=\\bigl\\\{x=\\\{x\_\{n\}\\\}\_\{n\\in\\mathbb\{Z\}^\{2\}\}:\\\|x\\\|\_\{s\}<\\infty\\bigr\\\}\.In particular,h0\(ℤ2\)=ℓ2\(ℤ2\)h^\{0\}\(\\mathbb\{Z\}^\{2\}\)=\\ell^\{2\}\(\\mathbb\{Z\}^\{2\}\)\. In fact, thanks to the Fourier expansion \([2\.5](https://arxiv.org/html/2606.27459#S2.E5)\) and Parseval’s identity \([2\.7](https://arxiv.org/html/2606.27459#S2.E7)\), the function spaceHs\(𝕋2\)H^\{s\}\(\\mathbb\{T\}^\{2\}\)is identified withhs\(ℤ2\)h^\{s\}\(\\mathbb\{Z\}^\{2\}\)through the isometric isomorphismf↦\{f^k\}k∈ℤ2f\\mapsto\\\{\\widehat\{f\}\_\{k\}\\\}\_\{k\\in\\mathbb\{Z\}^\{2\}\}, and we write‖f‖Hs:=‖\{f^k\}‖hs\\\|f\\\|\_\{H^\{s\}\}:=\\\|\\\{\\widehat\{f\}\_\{k\}\\\}\\\|\_\{h^\{s\}\}\. ### 2\.3\.Littlewood\-Paley projections Next we define the Littlewood\-Paley projectors\. Fix an even, non\-increasing functionη∈Cc∞\(ℝ;\[0,1\]\)\\eta\\in C\_\{c\}^\{\\infty\}\(\\mathbb\{R\};\[0,1\]\)with \(2\.13\)η\(t\)=1for\|t\|≤1,η\(t\)=0for\|t\|≥2\.\\displaystyle\\eta\(t\)=1\\ \\text\{ for \}\|t\|\\leq 1,\\qquad\\eta\(t\)=0\\ \\text\{ for \}\|t\|\\geq 2\.For a dyadic integerN∈2ℕ0=\{1,2,4,8,…\}N\\in 2^\{\\mathbb\{N\}\_\{0\}\}=\\\{1,2,4,8,\\dots\\\}, define the symbolsηN:ℤ2→\[0,1\]\\eta\_\{N\}:\\mathbb\{Z\}^\{2\}\\to\[0,1\]by \(2\.14\)η1\(ξ\):=η\(\|ξ\|\),ηN\(ξ\):=η\(\|ξ\|N\)−η\(2\|ξ\|N\)forN≥2\.\\displaystyle\\eta\_\{1\}\(\\xi\):=\\eta\(\|\\xi\|\),\\qquad\\eta\_\{N\}\(\\xi\):=\\eta\\\!\\left\(\\frac\{\|\\xi\|\}\{N\}\\right\)\-\\eta\\\!\\left\(\\frac\{2\|\\xi\|\}\{N\}\\right\)\\ \\ \\text\{for \}N\\geq 2\.By construction these form a partition of unity onℤ2\\mathbb\{Z\}^\{2\}, \(2\.15\)∑N∈2ℕ0ηN\(ξ\)=1for everyξ∈ℤ2,\\displaystyle\\sum\_\{N\\in 2^\{\\mathbb\{N\}\_\{0\}\}\}\\eta\_\{N\}\(\\xi\)=1\\qquad\\text\{for every \}\\xi\\in\\mathbb\{Z\}^\{2\},withηN\\eta\_\{N\}supported in the annulus\{N/2≤\|ξ\|≤2N\}\\\{N/2\\leq\|\\xi\|\\leq 2N\\\}forN≥2N\\geq 2and in the ball\{\|ξ\|≤2\}\\\{\|\\xi\|\\leq 2\\\}forN=1N=1\. The*Littlewood–Paley projector*PNP\_\{N\}is the Fourier multiplier with symbolηN\\eta\_\{N\}, i\.e\. forffwith Fourier coefficients\{f^k\}k∈ℤ2\\\{\\widehat\{f\}\_\{k\}\\\}\_\{k\\in\\mathbb\{Z\}^\{2\}\}, \(2\.16\)\(PNf\)^k:=ηN\(k\)f^k,k∈ℤ2\.\\displaystyle\\widehat\{\(P\_\{N\}f\)\}\_\{k\}:=\\eta\_\{N\}\(k\)\\,\\widehat\{f\}\_\{k\},\\qquad k\\in\\mathbb\{Z\}^\{2\}\.For any thresholdN∈\(0,∞\)N\\in\(0,\\infty\)we define the low\- and high\-frequency projections \(2\.17\)P≤N:=∑M∈2ℕ0M≤NPM,P\>N:=∑M∈2ℕ0M\>NPM=I−P≤N\.\\displaystyle P\_\{\\leq N\}:=\\sum\_\{\\begin\{subarray\}\{c\}M\\in 2^\{\\mathbb\{N\}\_\{0\}\}\\\\ M\\leq N\\end\{subarray\}\}P\_\{M\},\\qquad P\_\{\>N\}:=\\sum\_\{\\begin\{subarray\}\{c\}M\\in 2^\{\\mathbb\{N\}\_\{0\}\}\\\\ M\>N\\end\{subarray\}\}P\_\{M\}=I\-P\_\{\\leq N\}\.EachPNP\_\{N\},P≤NP\_\{\\leq N\}, andP\>NP\_\{\>N\}is a bounded linear operator onHs\(𝕋2\)H^\{s\}\(\\mathbb\{T\}^\{2\}\)for everys≥0s\\geq 0, with operator norm at most11; in particular the projections do not increase theHsH^\{s\}norm\. ### 2\.4\.Useful Estimates We record two properties of these spaces that are used repeatedly in the sequel\. ForK\>0K\>0letP≤KP\_\{\\leq K\}denote the Fourier projection onto frequenciesΛK:=\{k=\(m,ℓ\)∈ℤ2:\|m\|,\|ℓ\|≤K\}\\Lambda\_\{K\}:=\\\{k=\(m,\\ell\)\\in\\mathbb\{Z\}^\{2\}:\|m\|,\|\\ell\|\\leq K\\\}, that is,P≤Kf^k=f^k\\widehat\{P\_\{\\leq K\}f\}\_\{k\}=\\widehat\{f\}\_\{k\}fork∈ΛKk\\in\\Lambda\_\{K\}and0otherwise\. ###### Lemma 2\.3\(Bernstein inequality\)\. Let0≤s≤s′0\\leq s\\leq s^\{\\prime\}\. For everyf∈Hs′\(𝕋2\)f\\in H^\{s^\{\\prime\}\}\(\\mathbb\{T\}^\{2\}\)and everyK\>0K\>0, \(2\.18\)‖P≤Kf‖Hs′≤⟨K⟩s′−s‖P≤Kf‖Hs,\\displaystyle\\\|P\_\{\\leq K\}f\\\|\_\{H^\{s^\{\\prime\}\}\}\\leq\\langle K\\rangle^\{\\,s^\{\\prime\}\-s\}\\,\\\|P\_\{\\leq K\}f\\\|\_\{H^\{s\}\},and dually, for everyf∈Hs′\(𝕋2\)f\\in H^\{s^\{\\prime\}\}\(\\mathbb\{T\}^\{2\}\)frequency\-supported outsideΛK\\Lambda\_\{K\}\(i\.e\.f^k=0\\widehat\{f\}\_\{k\}=0fork∈ΛKk\\in\\Lambda\_\{K\}\), \(2\.19\)‖f‖Hs≤⟨K⟩−\(s′−s\)‖f‖Hs′\.\\displaystyle\\\|f\\\|\_\{H^\{s\}\}\\leq\\langle K\\rangle^\{\\,\-\(s^\{\\prime\}\-s\)\}\\,\\\|f\\\|\_\{H^\{s^\{\\prime\}\}\}\. ###### Lemma 2\.4\(Algebra property\)\. Lets\>1s\>1\. ThenHs\(𝕋2\)H^\{s\}\(\\mathbb\{T\}^\{2\}\)is a Banach algebra under pointwise multiplication, that is, there exists a constantCs\>0C\_\{s\}\>0, depending only onss, such that for allf,g∈Hs\(𝕋2\)f,g\\in H^\{s\}\(\\mathbb\{T\}^\{2\}\), \(2\.20\)‖fg‖Hs≤Cs‖f‖Hs‖g‖Hs\.\\displaystyle\\\|fg\\\|\_\{H^\{s\}\}\\leq C\_\{s\}\\,\\\|f\\\|\_\{H^\{s\}\}\\,\\\|g\\\|\_\{H^\{s\}\}\.In particular, for any three functionsf,g,h∈Hs\(𝕋2\)f,g,h\\in H^\{s\}\(\\mathbb\{T\}^\{2\}\)the cubic expression obeys \(2\.21\)‖fgh‖Hs≤Cs‖f‖Hs‖g‖Hs‖h‖Hs\.\\displaystyle\\\|fgh\\\|\_\{H^\{s\}\}\\leq C\_\{s\}\\,\\\|f\\\|\_\{H^\{s\}\}\\,\\\|g\\\|\_\{H^\{s\}\}\\,\\\|h\\\|\_\{H^\{s\}\}\. ### 2\.5\.The Model We consider the cubic defocusing nonlinear Schrödinger equation on𝕋2\\mathbb\{T\}^\{2\}, \(2\.22\)\{i∂tψ\+Δωψ=\|ψ\|2ψ,\(t,x,y\)∈ℝ×𝕋2,ψ\(0,⋅\)=ψ0∈Hs\(𝕋2\)\.\\begin\{cases\}i\\partial\_\{t\}\\psi\+\\Delta\_\{\\omega\}\\psi=\|\\psi\|^\{2\}\\psi,&\(t,x,y\)\\in\\mathbb\{R\}\\times\\mathbb\{T\}^\{2\},\\\\\[2\.0pt\] \\psi\(0,\\cdot\)=\\psi\_\{0\}\\in H^\{s\}\(\\mathbb\{T\}^\{2\}\)\.\\end\{cases\}Writing the solution in terms of its Fourier coefficients, \(2\.23\)ψ\(t,x,y\)=∑k=\(m,ℓ\)∈ℤ2ψ^k\(t\)ei\(mx\+ℓy\),\\displaystyle\\psi\(t,x,y\)=\\sum\_\{k=\(m,\\ell\)\\in\\mathbb\{Z\}^\{2\}\}\\widehat\{\\psi\}\_\{k\}\(t\)\\,e^\{i\(mx\+\\ell y\)\},substituting into \([2\.22](https://arxiv.org/html/2606.27459#S2.E22)\), and using \([2\.8](https://arxiv.org/html/2606.27459#S2.E8)\) together with \(2\.24\)\(\|ψ\|2ψ\)^k=∑k1−k2\+k3=kkj∈ℤ2ψ^k1ψ^k2¯ψ^k3,\\displaystyle\\widehat\{\(\|\\psi\|^\{2\}\\psi\)\}\_\{k\}=\\sum\_\{\\begin\{subarray\}\{c\}k\_\{1\}\-k\_\{2\}\+k\_\{3\}=k\\\\ k\_\{j\}\\in\\mathbb\{Z\}^\{2\}\\end\{subarray\}\}\\widehat\{\\psi\}\_\{k\_\{1\}\}\\,\\overline\{\\widehat\{\\psi\}\_\{k\_\{2\}\}\}\\,\\widehat\{\\psi\}\_\{k\_\{3\}\},the equation \([2\.22](https://arxiv.org/html/2606.27459#S2.E22)\) is equivalent to the infinite system of ODEs for the Fourier coefficients \(2\.25\)i∂tψ^k−λkψ^k=∑k1−k2\+k3=kkj∈ℤ2ψ^k1ψ^k2¯ψ^k3,i\\partial\_\{t\}\\widehat\{\\psi\}\_\{k\}\-\\lambda\_\{k\}\\widehat\{\\psi\}\_\{k\}=\\sum\_\{\\begin\{subarray\}\{c\}k\_\{1\}\-k\_\{2\}\+k\_\{3\}=k\\\\ k\_\{j\}\\in\\mathbb\{Z\}^\{2\}\\end\{subarray\}\}\\widehat\{\\psi\}\_\{k\_\{1\}\}\\overline\{\\widehat\{\\psi\}\_\{k\_\{2\}\}\}\\widehat\{\\psi\}\_\{k\_\{3\}\},whereλk=m2\+ω2ℓ2\\lambda\_\{k\}=m^\{2\}\+\\omega^\{2\}\\ell^\{2\}fork=\(m,ℓ\)∈ℤ2k=\(m,\\ell\)\\in\\mathbb\{Z\}^\{2\}\. The cubic nonlinearity in \([3\.5](https://arxiv.org/html/2606.27459#S3.E5)\) couples quartets of Fourier modes\(k1,k2,k3,k4\)\(k\_\{1\},k\_\{2\},k\_\{3\},k\_\{4\}\)satisfying the momentum relationk1\+k2=k3\+k4k\_\{1\}\+k\_\{2\}=k\_\{3\}\+k\_\{4\}\. To see why a distinguished subclass of these quartets governs the long\-time dynamics, it is convenient to pass to the interaction representation by removing the linear flow\. Writingψ^k\(t\)=e−iλktak\(t\)\\widehat\{\\psi\}\_\{k\}\(t\)=e^\{\-i\\lambda\_\{k\}t\}a\_\{k\}\(t\), the linear part of \([3\.5](https://arxiv.org/html/2606.27459#S3.E5)\) is absorbed into the exponential, and the equation for the slowly varying amplitudesaka\_\{k\}becomes \(2\.26\)i∂tak=∑k1−k2\+k3=kkj∈ℤ2eiΩ\(k1,k2,k3,k\)tak1ak2¯ak3,Ω:=λk1−λk2\+λk3−λk,i\\partial\_\{t\}a\_\{k\}=\\sum\_\{\\begin\{subarray\}\{c\}k\_\{1\}\-k\_\{2\}\+k\_\{3\}=k\\\\ k\_\{j\}\\in\\mathbb\{Z\}^\{2\}\\end\{subarray\}\}e^\{\\,i\\Omega\(k\_\{1\},k\_\{2\},k\_\{3\},k\)\\,t\}\\,a\_\{k\_\{1\}\}\\overline\{a\_\{k\_\{2\}\}\}a\_\{k\_\{3\}\},\\quad\\Omega:=\\lambda\_\{k\_\{1\}\}\-\\lambda\_\{k\_\{2\}\}\+\\lambda\_\{k\_\{3\}\}\-\\lambda\_\{k\},where, relabelingk4=kk\_\{4\}=k, the momentum constraint readsk1\+k3=k2\+k4k\_\{1\}\+k\_\{3\}=k\_\{2\}\+k\_\{4\}and the phase is the frequency mismatch \(2\.27\)Ω\(k1,k2,k3,k4\):=λk1\+λk2−λk3−λk4\.\\Omega\(k\_\{1\},k\_\{2\},k\_\{3\},k\_\{4\}\):=\\lambda\_\{k\_\{1\}\}\+\\lambda\_\{k\_\{2\}\}\-\\lambda\_\{k\_\{3\}\}\-\\lambda\_\{k\_\{4\}\}\.Each term in \([2\.26](https://arxiv.org/html/2606.27459#S2.E26)\) thus carries an oscillatory factoreiΩte^\{i\\Omega t\}\. WhenΩ≠0\\Omega\\neq 0, this phase rotates in time, and over long intervals the contribution of that quartet averages to a negligible amount by the method of stationary phase\. Such rapid oscillation causes successive contributions to cancel, so the interaction transfers essentially no energy between the modes\. WhenΩ=0\\Omega=0, the phase is stationary,eiΩt≡1e^\{i\\Omega t\}\\equiv 1, and the corresponding term contributes coherently and cumulatively for all time\. These persistent interactions are the ones that actually drive the transfer of energy across frequencies\. Accordingly, a quartet\(k1,k2,k3,k4\)\(k\_\{1\},k\_\{2\},k\_\{3\},k\_\{4\}\)withk1\+k2=k3\+k4k\_\{1\}\+k\_\{2\}=k\_\{3\}\+k\_\{4\}is called*resonant*precisely when its frequency mismatch vanishes, \(2\.28\)λk1\+λk2=λk3\+λk4⟺Ω\(k1,k2,k3,k4\)=0\.\\lambda\_\{k\_\{1\}\}\+\\lambda\_\{k\_\{2\}\}=\\lambda\_\{k\_\{3\}\}\+\\lambda\_\{k\_\{4\}\}\\quad\\Longleftrightarrow\\quad\\Omega\(k\_\{1\},k\_\{2\},k\_\{3\},k\_\{4\}\)=0\.Usingλk=m2\+ω2ℓ2\\lambda\_\{k\}=m^\{2\}\+\\omega^\{2\}\\ell^\{2\}, the resonance condition \([2\.28](https://arxiv.org/html/2606.27459#S2.E28)\) separates into itsxx\- andyy\-components, \(2\.29\)Ω\(k1,k2,k3,k4\)=\(m12\+m22−m32−m42\)⏟=:p∈ℤ\+ω2\(ℓ12\+ℓ22−ℓ32−ℓ42\)⏟=:q∈ℤ\.\\Omega\(k\_\{1\},k\_\{2\},k\_\{3\},k\_\{4\}\)=\\underbrace\{\(m\_\{1\}^\{2\}\+m\_\{2\}^\{2\}\-m\_\{3\}^\{2\}\-m\_\{4\}^\{2\}\)\}\_\{=:p\\,\\in\\,\\mathbb\{Z\}\}\+\\omega^\{2\}\\underbrace\{\(\\ell\_\{1\}^\{2\}\+\\ell\_\{2\}^\{2\}\-\\ell\_\{3\}^\{2\}\-\\ell\_\{4\}^\{2\}\)\}\_\{=:q\\,\\in\\,\\mathbb\{Z\}\}\.Whenω2∉ℚ\\omega^\{2\}\\notin\\mathbb\{Q\}, the conditionp\+ω2q=0p\+\\omega^\{2\}q=0withp,q∈ℤp,q\\in\\mathbb\{Z\}forcesp=q=0p=q=0separately, decoupling the resonances into two independent one\-dimensional conditions and greatly restricting the resonant set; whenω2∈ℚ\\omega^\{2\}\\in\\mathbb\{Q\}, genuinely two\-dimensional resonant interactions occur\. Thus the rational and irrational cases are distinguished entirely through the arithmetic ofλk\\lambda\_\{k\}, and the size ofΩ\\Omegais governed by how wellω2\\omega^\{2\}is approximated by rationals\. Consequently, on irrational tori the resonant interactions that drive efficient energy transfer to high frequencies are fewer, so both the transfer of energy and the growth of Sobolev norms are generally weaker than on rational tori[hrabski2021energy](https://arxiv.org/html/2606.27459#bib.bib13)\. If the aspect ratio remains close to rational, some of the growth mechanisms of the rational setting may persist\. This sensitivity to the geometry of the domain is one of the main reasons rational and irrational tori provide a meaningful benchmark for the operator\-learning study carried out in this paper\. ### 2\.6\.Known Theoretical Results In this subsection, we collect a few results that study the long\-time behavior and growth of Sobolev norms of solutions on generic irrational tori and rational tori, and show that the growth on irrational tori is significantly more constrained than in the rational case\. On square tori, ###### Theorem 2\.5\(\([HK26,](https://arxiv.org/html/2606.27459#bib.bib12), Theorem 1\.5\)\)\. ForN∈2ℕN\\in 2^\{\\mathbb\{N\}\}, denote byϕN\\phi^\{N\}the function \(2\.30\)ϕN:=ℱ−1\(χN10ℤ2⋅e−\|ξ/N11\|2\)\\displaystyle\\phi^\{N\}:=\\mathcal\{F\}^\{\-1\}\(\\chi\_\{N^\{10\}\\mathbb\{Z\}^\{2\}\}\\cdot e^\{\-\|\\xi/N^\{11\}\|^\{2\}\}\)LetT\>0T\>0andλ\>0\\lambda\>0be a small number\. LetψN∈Cloc∞\(ℝ×𝕋2\)\\psi^\{N\}\\in C\_\{loc\}^\{\\infty\}\(\\mathbb\{R\}\\times\\mathbb\{T\}^\{2\}\)be the solution to \([2\.22](https://arxiv.org/html/2606.27459#S2.E22)\) with the initial datumψ0N:=λN−1ϕN\\psi\_\{0\}^\{N\}:=\\lambda N^\{\-1\}\\phi^\{N\}\. We have \(2\.31\)‖ψ0N‖L2∼λ,log‖ψ0N‖H1≤ClogN\\displaystyle\\\|\\psi\_\{0\}^\{N\}\\\|\_\{L^\{2\}\}\\sim\\lambda,\\quad\\log\\\|\\psi\_\{0\}^\{N\}\\\|\_\{H^\{1\}\}\\leq C\\log Nand \(2\.32\)lim supN→∞‖ψN\(t\)−e−3itλ2lnNeitΔψ0N‖C0L2∩Lt,x4\(\[0,TlogN\)×𝕋2\)=0\.\\displaystyle\\limsup\_\{N\\to\\infty\}\\\|\\psi^\{N\}\(t\)\-e^\{\-3it\\lambda^\{2\}\\ln N\}e^\{it\\Delta\}\\psi\_\{0\}^\{N\}\\\|\_\{C^\{0\}L^\{2\}\\cap L\_\{t,x\}^\{4\}\(\[0,\\frac\{T\}\{\\log N\}\)\\times\\mathbb\{T\}^\{2\}\)\}=0\. On generic irrational tori, ###### Theorem 2\.6\(\([hrabski2021energy,](https://arxiv.org/html/2606.27459#bib.bib13), Theorem 1\.1\)\)\. Assume that the solution of a periodic NLS equation as in \([2\.22](https://arxiv.org/html/2606.27459#S2.E22)\), not necessarily on an irrational torus, satisfies fors≫1s\\gg 1the asymptotic estimate‖ψ\(t\)‖Hs≤C\|t\|sαR2α\+1\\\|\\psi\(t\)\\\|\_\{H^\{s\}\}\\leq C\|t\|^\{s\\alpha\}R^\{2\\alpha\+1\}forα\>0\\alpha\>0andR=‖ψ\(0\)‖HsR=\\\|\\psi\(0\)\\\|\_\{H^\{s\}\}\. Then on a torusTω¯2T^\{2\}\_\{\\underline\{\\omega\}\}, whereω∈ℝ\+2\\omega\\in\\mathbb\{R\}^\{2\}\_\{\+\}is a generic irrational ordered pair, if the initial data has bounded frequency support, we can improve the estimate above for\|t\|≫1\|t\|\\gg 1to ‖ψ\(t\)‖Hs≤C\|t\|s\(α\(s\+2\+2τ\)\+13s\+2τ−2\)Rs\(2α\+1\)3s\+2τ−2L−3s3s\+2τ−2,\\\|\\psi\(t\)\\\|\_\{H^\{s\}\}\\leq C\|t\|^\{s\\left\(\\frac\{\\alpha\(s\+2\+2\\tau\)\+1\}\{3s\+2\\tau\-2\}\\right\)\}R^\{\\frac\{s\(2\\alpha\+1\)\}\{3s\+2\\tau\-2\}\}L^\{\-\\frac\{3s\}\{3s\+2\\tau\-2\}\},whereτ\>1\\tau\>1and the constantCCdepends onτ\\tau,ω\\omegaand possibly on the size of the support, andLLis theL2L^\{2\}norm of the initial data\. ###### Corollary 2\.7\(\([hrabski2021energy,](https://arxiv.org/html/2606.27459#bib.bib13), Corollary 1\.2\)\)\. Letε\>0\\varepsilon\>0and let52\+\(2ε\)−1<s\\frac\{5\}\{2\}\+\(2\\varepsilon\)^\{\-1\}<s\. Then the solutionψ\\psiof the NLS equation in \([2\.22](https://arxiv.org/html/2606.27459#S2.E22)\), and with initial data of bounded frequency support, is such that \(2\.33\)‖ψ\(t\)‖Hs≤C\|t\|εsR2ε\+1,\\displaystyle\\\|\\psi\(t\)\\\|\_\{H^\{s\}\}\\leq C\|t\|^\{\\varepsilon s\}R^\{2\\varepsilon\+1\},forttlarge enough, where the constantCConly depends onω\\omega, the size of the support of the initial data and itsL2L^\{2\}norm\. The following result, used in[hrabski2021energy](https://arxiv.org/html/2606.27459#bib.bib13), distinguishes generic irrational aspect ratios from rational ones\. ###### Theorem 2\.9\(Diophantine approximation\)\. There exists a full\-measure subsetE⊂ℝE\\subset\\mathbb\{R\}such that for everyα∈E\\alpha\\in Eand allC\>0C\>0,τ\>0\\tau\>0, the inequality \(2\.35\)\|pq−α\|≤Cq2\+τ\\left\|\\frac\{p\}\{q\}\-\\alpha\\right\|\\leq\\frac\{C\}\{q^\{2\+\\tau\}\}has only finitely many solutions\(p,q\)∈ℤ×ℤ\+\(p,q\)\\in\\mathbb\{Z\}\\times\\mathbb\{Z\}^\{\+\}\. This motivates the comparison between rational and irrational geometries in our experiments: following[hrabski2021energy](https://arxiv.org/html/2606.27459#bib.bib13), we use the rational torusω2=1\\omega^\{2\}=1as a benchmark and the irrational torusω2=2\\omega^\{2\}=\\sqrt\{2\}as its irrational counterpart\. ## 3\.Operator Learning LetΩ⊂ℝd\\Omega\\subset\\mathbb\{R\}^\{d\}be a bounded spatial domain\. We denote by𝒜=𝒜\(Ω;ℝda\)\\mathcal\{A\}=\\mathcal\{A\}\(\\Omega;\\,\\mathbb\{R\}^\{d\_\{a\}\}\)the space of admissible input functions and by𝒰=𝒰\(Ω;ℝdu\)\\mathcal\{U\}=\\mathcal\{U\}\(\\Omega;\\,\\mathbb\{R\}^\{d\_\{u\}\}\)the space of output solution functions, taking values inℝda\\mathbb\{R\}^\{d\_\{a\}\}andℝdu\\mathbb\{R\}^\{d\_\{u\}\}respectively\. In this formulation, the object of interest is a map between function spaces,𝒥†:𝒜→𝒰,\\mathcal\{J\}^\{\\dagger\}:\\mathcal\{A\}\\to\\mathcal\{U\},which is typically nonlinear[Kovachki2023NeuralOperator](https://arxiv.org/html/2606.27459#bib.bib18);[Lu2021DeepONet](https://arxiv.org/html/2606.27459#bib.bib21)\. Herea∈𝒜a\\in\\mathcal\{A\}may represent an initial condition, coefficient field, forcing term, or other functional parameter, whileu=𝒥†\(a\)u=\\mathcal\{J\}^\{\\dagger\}\(a\)denotes the corresponding solution\. Given observations\{\(aj,uj\)\}j=1N,\\\{\(a\_\{j\},u\_\{j\}\)\\\}\_\{j=1\}^\{N\},withaja\_\{j\}sampled from a probability measureν\\nuon𝒜\\mathcal\{A\}, we aim to construct a parameterized approximation𝒥θ:𝒜→𝒰,θ∈Θ,\\mathcal\{J\}\_\{\\theta\}:\\mathcal\{A\}\\to\\mathcal\{U\},\\,\\theta\\in\\Theta,for some finite\-dimensional parameter spaceΘ\\Theta, such that𝒥θ\(a\)≈𝒥†\(a\)\\mathcal\{J\}\_\{\\theta\}\(a\)\\approx\\mathcal\{J\}^\{\\dagger\}\(a\)for inputs drawn from the same distribution\. Formally, one seeks parameters minimizing an expected discrepancy, minθ∈Θ𝔼a∼ν\[𝒞\(𝒥θ\(a\),𝒥†\(a\)\)\],\\min\_\{\\theta\\in\\Theta\}\\mathbb\{E\}\_\{a\\sim\\nu\}\\left\[\\mathcal\{C\}\\bigl\(\\mathcal\{J\}\_\{\\theta\}\(a\),\\mathcal\{J\}^\{\\dagger\}\(a\)\\bigr\)\\right\],where𝒞:𝒰×𝒰→ℝ\\mathcal\{C\}:\\mathcal\{U\}\\times\\mathcal\{U\}\\to\\mathbb\{R\}is a chosen cost functional\. In practice, this expected loss is replaced by an empirical loss over the available training data, for example minθ∈Θ1N∑j=1N𝒞\(𝒥θ\(aj\),uj\)\.\\min\_\{\\theta\\in\\Theta\}\\frac\{1\}\{N\}\\sum\_\{j=1\}^\{N\}\\mathcal\{C\}\\bigl\(\\mathcal\{J\}\_\{\\theta\}\(a\_\{j\}\),u\_\{j\}\\bigr\)\.Although the mathematical formulation is posed over function spaces, the training data are observed on finite discretizations\. IfΩj=\{x1,…,xn\}⊂Ω\\Omega\_\{j\}=\\\{x\_\{1\},\\dots,x\_\{n\}\\\}\\subset\\Omegais a set of sampling points, then one has access to arraysaj\|Ωj∈ℝn×da,uj\|Ωj∈ℝn×du\.a\_\{j\}\|\_\{\\Omega\_\{j\}\}\\in\\mathbb\{R\}^\{n\\times d\_\{a\}\},\\,u\_\{j\}\|\_\{\\Omega\_\{j\}\}\\in\\mathbb\{R\}^\{n\\times d\_\{u\}\}\.The purpose of a neural operator is to learn a representation of𝒥†\\mathcal\{J\}^\{\\dagger\}that is not tied to one particular discretization\. Thus, the same learned parameters should define a mapping between functions, while their finite\-dimensional realization may be evaluated on different grids, provided the relevant features of the functions are resolved[Li2020GraphKernel](https://arxiv.org/html/2606.27459#bib.bib19);[Li2021FNO](https://arxiv.org/html/2606.27459#bib.bib20);[Kovachki2023NeuralOperator](https://arxiv.org/html/2606.27459#bib.bib18)\. ### 3\.1\.Fourier Neural Operator The Fourier neural operator constructs𝒥θ\\mathcal\{J\}\_\{\\theta\}by combining local pointwise transformations with nonlocal transformations in Fourier space\. The input function is first lifted to a higher\-dimensional channel space through a shallow fully connected neural networkP:ℝda→ℝdvP:\\mathbb\{R\}^\{d\_\{a\}\}\\to\\mathbb\{R\}^\{d\_\{v\}\}, formally written asv0\(x\)=P\(a\(x\)\)v\_\{0\}\(x\)=P\(a\(x\)\)\. Heredvd\_\{v\}is the width of the hidden representation\. The lifted function is then passed through a sequence of operator layersv0↦v1↦⋯↦vT,v\_\{0\}\\mapsto v\_\{1\}\\mapsto\\cdots\\mapsto v\_\{T\},and the final representation is projected back to the target dimension byu\(x\)=Q\(vT\(x\)\),Q:ℝdv→ℝdu\.u\(x\)=Q\(v\_\{T\}\(x\)\),\\,Q:\\mathbb\{R\}^\{d\_\{v\}\}\\to\\mathbb\{R\}^\{d\_\{u\}\}\.Equivalently, the full architecture may be written as \(3\.1\)𝒥θ=Q∘ℒT∘⋯∘ℒ1∘P\.\\displaystyle\\mathcal\{J\}\_\{\\theta\}=Q\\circ\\mathcal\{L\}\_\{T\}\\circ\\cdots\\circ\\mathcal\{L\}\_\{1\}\\circ P\.whereℒ1,…,ℒT\\mathcal\{L\}\_\{1\},\\ldots,\\mathcal\{L\}\_\{T\}denote the Fourier layers\. Each layer updates the hidden function by \(3\.2\)vt\+1\(x\)=σ\(Wvt\(x\)\+\(𝒦\(a;ϕ\)vt\)\(x\)\),t=0,…,T−1,\\displaystyle v\_\{t\+1\}\(x\)=\\sigma\\left\(Wv\_\{t\}\(x\)\+\(\\mathcal\{K\}\(a;\\,\\phi\)v\_\{t\}\)\(x\)\\right\),\\quad t=0,\\dots,T\-1,whereσ:ℝ→ℝ\\sigma:\\mathbb\{R\}\\to\\mathbb\{R\}is a nonlinear activation function,WWis the linear transformation, and𝒦\(a;ϕ\)\\mathcal\{K\}\(a;\\,\\phi\)is a kernel integral operator that is parameterized byϕ\\phi, written as \(3\.3\)\(𝒦\(a;ϕ\)vt\)\(x\)=∫Ωκ\(x,y,a\(x\),a\(y\);ϕ\)vt\(y\)𝑑y\.\\displaystyle\(\\mathcal\{K\}\(a;\\,\\phi\)v\_\{t\}\)\(x\)=\\int\_\{\\Omega\}\\kappa\(x,y,a\(x\),a\(y\);\\,\\phi\)v\_\{t\}\(y\)\\,dy\.Here,κ\(⋅;ϕ\)\\kappa\(\\cdot;\\,\\phi\)is a learned kernel function\. If the kernel depends only on the relative displacement, so thatκ\(x,y,a\(x\),a\(y\);ϕ\)=κ\(x−y;ϕ\)\\kappa\(x,y,a\(x\),a\(y\);\\,\\phi\)=\\kappa\(x\-y;\\,\\phi\), then the kernel integral operator \([3\.3](https://arxiv.org/html/2606.27459#S3.E3)\) becomes a convolution\. By the convolution theorem, \(𝒦\(ϕ\)vt\)\(x\)=ℱ−1\(ℱ\(κϕ\)⋅ℱ\(vt\)\)\(x\),\\displaystyle\(\\mathcal\{K\}\(\\phi\)v\_\{t\}\)\(x\)=\\mathcal\{F\}^\{\-1\}\\left\(\\mathcal\{F\}\(\\kappa\_\{\\phi\}\)\\,\\cdot\\mathcal\{F\}\(v\_\{t\}\)\\right\)\(x\),or equivalently written as \(3\.4\)\(𝒦\(ϕ\)vt\)\(x\)=ℱ−1\(Rϕ⋅ℱ\(vt\)\)\(x\),\\displaystyle\(\\mathcal\{K\}\(\\phi\)v\_\{t\}\)\(x\)=\\mathcal\{F\}^\{\-1\}\\left\(R\_\{\\phi\}\\cdot\\mathcal\{F\}\(v\_\{t\}\)\\right\)\(x\),whereRϕR\_\{\\phi\}represents the Fourier transform of the periodic functionκ\\kappa, which can be taken via the scalar convolution kernel\. Thus, equation \([3\.2](https://arxiv.org/html/2606.27459#S3.E2)\) becomes, vt\+1\(x\)=σ\(Wvt\(x\)\+ℱ−1\(Rϕ⋅ℱ\(vt\)\)\(x\)\),v\_\{t\+1\}\(x\)=\\sigma\\left\(Wv\_\{t\}\(x\)\+\\mathcal\{F\}^\{\-1\}\\left\(R\_\{\\phi\}\\cdot\\mathcal\{F\}\(v\_\{t\}\)\\right\)\(x\)\\right\),and the final output is obtained by applying the shallow fully connected layerQ\(⋅\)Q\(\\cdot\)locally tovTv\_\{T\}, givingu\(x\)=Q\(vT\(x\)\)\.u\(x\)=Q\(v\_\{T\}\(x\)\)\. In the FNO architecture, the lifting layerPPembeds the input field into a higher\-dimensional latent representation, while the projection layerQQmaps this representation back to the desired output dimension\. Given an input functiona\(x\)∈ℝB×H×W×C1a\(x\)\\in\\mathbb\{R\}^\{B\\times H\\times W\\times C\_\{1\}\}, the network learns an operator𝒥θ:𝒜⟼𝒰\\mathcal\{J\}\_\{\\theta\}:\\mathcal\{A\}\\longmapsto\\mathcal\{U\}such thatu\(x\)∈ℝB×H×W×C2u\(x\)\\in\\mathbb\{R\}^\{B\\times H\\times W\\times C\_\{2\}\}, whereBBis the batch size,HHandWWare the spatial grid dimensions \(in particular, the height and width respectively\), andC1C\_\{1\}andC2C\_\{2\}denote the input and output channel dimensions\. The main expressive component of the FNO is the Fourier layer, which updates the latent representation in spectral space and allows the model to capture global spatial interactions and complex multiscale features efficiently\. Figure 3\.1\.Schematic of the FNO architecture\. The input functiona\(x\)a\(x\)is lifted by the shallow fully connected networkPP, propagated through a sequence of Fourier layersF1,…,FnF\_\{1\},\\ldots,F\_\{n\}, and projected byQQto produce the outputu\(x\)u\(x\)\. The bottom image shows the structure of a Fourier layer, wherev\(x\)v\(x\)is updated by combining the spectral convolutionℱ−1\(Rθ⋅ℱv\)\\mathcal\{F\}^\{\-1\}\(R\_\{\\theta\}\\cdot\\mathcal\{F\}v\)with the pointwise linear mapWvWv, followed by the activation functionσ\\sigma\.We now state an a posteriori error estimate for the learned solution\. The estimate is formulated at the level of the Fourier coefficients used in any given reference solver\. Thus, the finite setΛk\\Lambda\_\{k\}specifies the retained Fourier modes, and the nonlinear interaction is restricted to those modes\. The residual \([3\.7](https://arxiv.org/html/2606.27459#S3.E7)\) measures how well the learned Fourier coefficients satisfy the same truncated modal system as the reference solution\. A small residual, together with a small initial mismatch, implies that the learned solution remains close to the reference solution on the time interval\[0,T\]\[0,T\]\. ###### Theorem 3\.1\(Error estimate\)\. Supposeω,T\>0\\omega,T\>0,s\>1s\>1, and letΛK=\{k=\(m,ℓ\)∈ℤ2:\|m\|,\|ℓ\|≤K\}\\Lambda\_\{K\}=\\\{k=\(m,\\ell\)\\in\\mathbb\{Z\}^\{2\}:\\ \|m\|,\|\\ell\|\\leq K\\\}\. Fork=\(m,ℓ\)∈ℤ2k=\(m,\\ell\)\\in\\mathbb\{Z\}^\{2\}, defineλk=m2\+ω2ℓ2\\lambda\_\{k\}=m^\{2\}\+\\omega^\{2\}\\ell^\{2\}\. Letψ\\psibe a solution to \([2\.22](https://arxiv.org/html/2606.27459#S2.E22)\) on𝕋2\\mathbb\{T\}^\{2\}with Fourier coefficientsψ^k\\widehat\{\\psi\}\_\{k\}satisfying the full mode equation \(3\.5\)i∂tψ^k−λkψ^k=∑k1−k2\+k3=kkj∈ℤ2ψ^k1ψ^k2¯ψ^k3,k∈ℤ2\.\\displaystyle i\\partial\_\{t\}\\widehat\{\\psi\}\_\{k\}\-\\lambda\_\{k\}\\widehat\{\\psi\}\_\{k\}=\\sum\_\{\\begin\{subarray\}\{c\}k\_\{1\}\-k\_\{2\}\+k\_\{3\}=k\\\\ k\_\{j\}\\in\\mathbb\{Z\}^\{2\}\\end\{subarray\}\}\\widehat\{\\psi\}\_\{k\_\{1\}\}\\overline\{\\widehat\{\\psi\}\_\{k\_\{2\}\}\}\\widehat\{\\psi\}\_\{k\_\{3\}\},\\qquad k\\in\\mathbb\{Z\}^\{2\}\.Letψθ\\psi^\{\\theta\}be the learned solution to the frequency\-restricted equation \(3\.6\)\{i∂tψθ\+Δωψθ=P≤K\(\|ψθ\|2ψθ\),\(t,x,y\)∈\[0,T\]×𝕋2,ψθ\(0,⋅\)=ψ0θ∈P≤KHs\(𝕋2\),\\displaystyle\\begin\{cases\}i\\partial\_\{t\}\\psi^\{\\theta\}\+\\Delta\_\{\\omega\}\\psi^\{\\theta\}=P\_\{\\leq K\}\\bigl\(\|\\psi^\{\\theta\}\|^\{2\}\\psi^\{\\theta\}\\bigr\),&\(t,x,y\)\\in\[0,T\]\\times\\mathbb\{T\}^\{2\},\\\\\[2\.0pt\] \\psi^\{\\theta\}\(0,\\cdot\)=\\psi^\{\\theta\}\_\{0\}\\in P\_\{\\leq K\}H^\{s\}\(\\mathbb\{T\}^\{2\}\),\\end\{cases\}with Fourier coefficientsψ^kθ\\widehat\{\\psi\}^\{\\,\\theta\}\_\{k\},k∈ΛKk\\in\\Lambda\_\{K\}, and define the residualrkθ\(t\)r\_\{k\}^\{\\theta\}\(t\)by \(3\.7\)rkθ\(t\)=i∂tψ^kθ\(t\)−λkψ^kθ\(t\)−∑k1−k2\+k3=kkj∈ΛKψ^k1θψ^k2θ¯ψ^k3θ,k∈ΛK\.\\displaystyle r\_\{k\}^\{\\theta\}\(t\)=i\\partial\_\{t\}\\widehat\{\\psi\}^\{\\,\\theta\}\_\{k\}\(t\)\-\\lambda\_\{k\}\\widehat\{\\psi\}^\{\\,\\theta\}\_\{k\}\(t\)\-\\sum\_\{\\begin\{subarray\}\{c\}k\_\{1\}\-k\_\{2\}\+k\_\{3\}=k\\\\ k\_\{j\}\\in\\Lambda\_\{K\}\\end\{subarray\}\}\\widehat\{\\psi\}^\{\\,\\theta\}\_\{k\_\{1\}\}\\overline\{\\widehat\{\\psi\}^\{\\,\\theta\}\_\{k\_\{2\}\}\}\\widehat\{\\psi\}^\{\\,\\theta\}\_\{k\_\{3\}\},\\qquad k\\in\\Lambda\_\{K\}\.Assume the uniform bound \(3\.8\)sup0≤t≤T\(‖ψ\(t\)‖Hs\+1\+‖ψθ\(t\)‖Hs\)≤M\.\\displaystyle\\sup\_\{0\\leq t\\leq T\}\\Bigl\(\\\|\\psi\(t\)\\\|\_\{H^\{s\+1\}\}\+\\\|\\psi^\{\\theta\}\(t\)\\\|\_\{H^\{s\}\}\\Bigr\)\\leq M\.Then there exists a constantCs\>0C\_\{s\}\>0, depending only onss, such that \(3\.9\)sup0≤t≤T‖ψ\(t\)−ψθ\(t\)‖Hs≤exp\(CsM2T\)\[‖ψ\(0\)−ψθ\(0\)‖Hs\+∫0T‖rθ\(τ\)‖hs𝑑τ\+CsM3TK\]\.\\displaystyle\\sup\_\{0\\leq t\\leq T\}\\\|\\psi\(t\)\-\\psi^\{\\theta\}\(t\)\\\|\_\{H^\{s\}\}\\leq\\exp\\bigl\(C\_\{s\}M^\{2\}T\\bigr\)\\left\[\\\|\\psi\(0\)\-\\psi^\{\\theta\}\(0\)\\\|\_\{H^\{s\}\}\+\\int\_\{0\}^\{T\}\\\|r^\{\\theta\}\(\\tau\)\\\|\_\{h^\{s\}\}\\,d\\tau\+\\frac\{C\_\{s\}M^\{3\}T\}\{K\}\\right\]\. ###### Proof\. Define the truncated cubic convolution operator on the Fourier side, \(3\.10\)𝒩\(z\)k=∑k1−k2\+k3=kkj∈ΛKzk1zk2¯zk3,k∈ΛK,\\displaystyle\\mathcal\{N\}\(z\)\_\{k\}=\\sum\_\{\\begin\{subarray\}\{c\}k\_\{1\}\-k\_\{2\}\+k\_\{3\}=k\\\\ k\_\{j\}\\in\\Lambda\_\{K\}\\end\{subarray\}\}z\_\{k\_\{1\}\}\\overline\{z\_\{k\_\{2\}\}\}z\_\{k\_\{3\}\},\\qquad k\\in\\Lambda\_\{K\},which restricts*all*frequenciesk,k1,k2,k3k,k\_\{1\},k\_\{2\},k\_\{3\}toΛK\\Lambda\_\{K\}\. For a sequencezzwith frequencies restricted toΛK\\Lambda\_\{K\}, the input restriction is automatic, and𝒩\(z\)\\mathcal\{N\}\(z\)coincides with the projection ontoΛK\\Lambda\_\{K\}of the cubic coefficient sequence ofuu, whereu^=z\\widehat\{u\}=z; equivalently \(3\.11\)𝒩\(z\)=P≤K\(\|P≤Ku\|2P≤Ku\)\.\\displaystyle\\mathcal\{N\}\(z\)=P\_\{\\leq K\}\\bigl\(\|P\_\{\\leq K\}u\|^\{2\}\\,P\_\{\\leq K\}u\\bigr\)\.In particular,𝒩\(ψ^θ\)=P≤K\(\|ψθ\|2ψθ\)\\mathcal\{N\}\(\\widehat\{\\psi\}^\{\\,\\theta\}\)=P\_\{\\leq K\}\(\|\\psi^\{\\theta\}\|^\{2\}\\psi^\{\\theta\}\)sinceψθ\\psi^\{\\theta\}has frequencies restricted toΛK\\Lambda\_\{K\}, whereas for the full solutionψ\\psi\([3\.11](https://arxiv.org/html/2606.27459#S3.E11)\) gives𝒩\(ψ^\)=P≤K\(\|P≤Kψ\|2P≤Kψ\)\\mathcal\{N\}\(\\widehat\{\\psi\}\)=P\_\{\\leq K\}\(\|P\_\{\\leq K\}\\psi\|^\{2\}P\_\{\\leq K\}\\psi\), which uses only the low modes ofψ\\psi\. Error equation\.Lete^k\(t\)=ψ^k\(t\)−ψ^kθ\(t\)\\widehat\{e\}\_\{k\}\(t\)=\\widehat\{\\psi\}\_\{k\}\(t\)\-\\widehat\{\\psi\}^\{\\,\\theta\}\_\{k\}\(t\)fork∈ΛKk\\in\\Lambda\_\{K\}\. The exact coefficients satisfy the full mode equation \([3\.5](https://arxiv.org/html/2606.27459#S3.E5)\) withkj∈ℤ2k\_\{j\}\\in\\mathbb\{Z\}^\{2\}; separating the low\-input triples from those reaching outsideΛK\\Lambda\_\{K\}, fork∈ΛKk\\in\\Lambda\_\{K\}, \(3\.12\)i∂tψ^k−λkψ^k=𝒩\(ψ^\)k\+ak,ak:=∑k1−k2\+k3=ksomekj∉ΛKψ^k1ψ^k2¯ψ^k3,\\displaystyle i\\partial\_\{t\}\\widehat\{\\psi\}\_\{k\}\-\\lambda\_\{k\}\\widehat\{\\psi\}\_\{k\}=\\mathcal\{N\}\(\\widehat\{\\psi\}\)\_\{k\}\+a\_\{k\},\\qquad a\_\{k\}:=\\sum\_\{\\begin\{subarray\}\{c\}k\_\{1\}\-k\_\{2\}\+k\_\{3\}=k\\\\ \\text\{some \}k\_\{j\}\\notin\\Lambda\_\{K\}\\end\{subarray\}\}\\widehat\{\\psi\}\_\{k\_\{1\}\}\\overline\{\\widehat\{\\psi\}\_\{k\_\{2\}\}\}\\widehat\{\\psi\}\_\{k\_\{3\}\},wherea=\{ak\}k∈ΛKa=\\\{a\_\{k\}\\\}\_\{k\\in\\Lambda\_\{K\}\}is the truncation term\. The learned coefficients satisfy \([3\.7](https://arxiv.org/html/2606.27459#S3.E7)\), that is \(3\.13\)i∂tψ^kθ−λkψ^kθ=𝒩\(ψ^θ\)k\+rkθ\.\\displaystyle i\\partial\_\{t\}\\widehat\{\\psi\}^\{\\,\\theta\}\_\{k\}\-\\lambda\_\{k\}\\widehat\{\\psi\}^\{\\,\\theta\}\_\{k\}=\\mathcal\{N\}\(\\widehat\{\\psi\}^\{\\,\\theta\}\)\_\{k\}\+r\_\{k\}^\{\\theta\}\.Subtracting, \(3\.14\)∂te^k=−iλke^k−i\(𝒩\(ψ^\)k−𝒩\(ψ^θ\)k\)−iak\+irkθ,k∈ΛK\.\\displaystyle\\partial\_\{t\}\\widehat\{e\}\_\{k\}=\-i\\lambda\_\{k\}\\widehat\{e\}\_\{k\}\-i\\bigl\(\\mathcal\{N\}\(\\widehat\{\\psi\}\)\_\{k\}\-\\mathcal\{N\}\(\\widehat\{\\psi\}^\{\\,\\theta\}\)\_\{k\}\\bigr\)\-i\\,a\_\{k\}\+i\\,r\_\{k\}^\{\\theta\},\\qquad k\\in\\Lambda\_\{K\}\. Energy estimate\.Differentiating the squared norm and substituting \([3\.14](https://arxiv.org/html/2606.27459#S3.E14)\), 12ddt‖e^\(t\)‖hs2\\displaystyle\\frac\{1\}\{2\}\\frac\{d\}\{dt\}\\\|\\widehat\{e\}\(t\)\\\|\_\{h^\{s\}\}^\{2\}=Re∑k∈ΛK⟨k⟩2s∂te^ke^k¯\\displaystyle=\\operatorname\{Re\}\\sum\_\{k\\in\\Lambda\_\{K\}\}\\langle k\\rangle^\{2s\}\\,\\partial\_\{t\}\\widehat\{e\}\_\{k\}\\,\\overline\{\\widehat\{e\}\_\{k\}\}=Re∑k∈ΛK−iλk⟨k⟩2s\|e^k\|2⏟=0−Re∑k∈ΛKi⟨k⟩2s\(𝒩\(ψ^\)k−𝒩\(ψ^θ\)k\)e^k¯\\displaystyle=\\underbrace\{\\operatorname\{Re\}\\sum\_\{k\\in\\Lambda\_\{K\}\}\-i\\lambda\_\{k\}\\langle k\\rangle^\{2s\}\|\\widehat\{e\}\_\{k\}\|^\{2\}\}\_\{=\\,0\}\-\\operatorname\{Re\}\\sum\_\{k\\in\\Lambda\_\{K\}\}i\\langle k\\rangle^\{2s\}\\bigl\(\\mathcal\{N\}\(\\widehat\{\\psi\}\)\_\{k\}\-\\mathcal\{N\}\(\\widehat\{\\psi\}^\{\\,\\theta\}\)\_\{k\}\\bigr\)\\overline\{\\widehat\{e\}\_\{k\}\}\(3\.15\)−Re∑k∈ΛKi⟨k⟩2sake^k¯\+Re∑k∈ΛKi⟨k⟩2srkθe^k¯\.\\displaystyle\\quad\-\\operatorname\{Re\}\\sum\_\{k\\in\\Lambda\_\{K\}\}i\\langle k\\rangle^\{2s\}\\,a\_\{k\}\\,\\overline\{\\widehat\{e\}\_\{k\}\}\+\\operatorname\{Re\}\\sum\_\{k\\in\\Lambda\_\{K\}\}i\\langle k\\rangle^\{2s\}\\,r\_\{k\}^\{\\theta\}\\,\\overline\{\\widehat\{e\}\_\{k\}\}\.The first sum vanishes sinceλk∈ℝ\\lambda\_\{k\}\\in\\mathbb\{R\}makes each summand purely imaginary; this is conservation of thehsh^\{s\}norm under the linear flow\. Splitting the weight as⟨k⟩2s=⟨k⟩s⋅⟨k⟩s\\langle k\\rangle^\{2s\}=\\langle k\\rangle^\{s\}\\cdot\\langle k\\rangle^\{s\}and applying the Cauchy–Schwarz inequality to each remaining sum, \(3\.16\)12ddt‖e^\(t\)‖hs2≤\(‖𝒩\(ψ^\)−𝒩\(ψ^θ\)‖hs\+‖a\(t\)‖hs\+‖rθ\(t\)‖hs\)‖e^\(t\)‖hs\.\\displaystyle\\frac\{1\}\{2\}\\frac\{d\}\{dt\}\\\|\\widehat\{e\}\(t\)\\\|\_\{h^\{s\}\}^\{2\}\\leq\\Bigl\(\\bigl\\\|\\mathcal\{N\}\(\\widehat\{\\psi\}\)\-\\mathcal\{N\}\(\\widehat\{\\psi\}^\{\\,\\theta\}\)\\bigr\\\|\_\{h^\{s\}\}\+\\\|a\(t\)\\\|\_\{h^\{s\}\}\+\\\|r^\{\\theta\}\(t\)\\\|\_\{h^\{s\}\}\\Bigr\)\\,\\\|\\widehat\{e\}\(t\)\\\|\_\{h^\{s\}\}\. Nonlinear difference\.Using \([3\.11](https://arxiv.org/html/2606.27459#S3.E11)\) for both arguments and𝒩\(ψ^θ\)=P≤K\(\|ψθ\|2ψθ\)\\mathcal\{N\}\(\\widehat\{\\psi\}^\{\\,\\theta\}\)=P\_\{\\leq K\}\(\|\\psi^\{\\theta\}\|^\{2\}\\psi^\{\\theta\}\), ‖𝒩\(ψ^\)−𝒩\(ψ^θ\)‖hs\\displaystyle\\bigl\\\|\\mathcal\{N\}\(\\widehat\{\\psi\}\)\-\\mathcal\{N\}\(\\widehat\{\\psi\}^\{\\,\\theta\}\)\\bigr\\\|\_\{h^\{s\}\}=‖P≤K\(\|P≤Kψ\|2P≤Kψ−\|ψθ\|2ψθ\)‖Hs\\displaystyle=\\bigl\\\|P\_\{\\leq K\}\\bigl\(\|P\_\{\\leq K\}\\psi\|^\{2\}P\_\{\\leq K\}\\psi\-\|\\psi^\{\\theta\}\|^\{2\}\\psi^\{\\theta\}\\bigr\)\\bigr\\\|\_\{H^\{s\}\}\(3\.17\)≤‖\|P≤Kψ\|2P≤Kψ−\|ψθ\|2ψθ‖Hs,\\displaystyle\\leq\\bigl\\\|\|P\_\{\\leq K\}\\psi\|^\{2\}P\_\{\\leq K\}\\psi\-\|\\psi^\{\\theta\}\|^\{2\}\\psi^\{\\theta\}\\bigr\\\|\_\{H^\{s\}\},sinceP≤KP\_\{\\leq K\}does not increase theHsH^\{s\}norm\. Writinge~:=P≤Kψ−ψθ\\tilde\{e\}:=P\_\{\\leq K\}\\psi\-\\psi^\{\\theta\}and telescoping the cubic difference into three terms, each linear ine~\\tilde\{e\}, \(3\.18\)\|P≤Kψ\|2P≤Kψ−\|ψθ\|2ψθ=e~P≤Kψ¯P≤Kψ\+ψθe~¯P≤Kψ\+ψθψθ¯e~\.\\displaystyle\|P\_\{\\leq K\}\\psi\|^\{2\}P\_\{\\leq K\}\\psi\-\|\\psi^\{\\theta\}\|^\{2\}\\psi^\{\\theta\}=\\tilde\{e\}\\,\\overline\{P\_\{\\leq K\}\\psi\}\\,P\_\{\\leq K\}\\psi\+\\psi^\{\\theta\}\\,\\overline\{\\tilde\{e\}\}\\,P\_\{\\leq K\}\\psi\+\\psi^\{\\theta\}\\,\\overline\{\\psi^\{\\theta\}\}\\,\\tilde\{e\}\.Applying the algebra property \(Lemma[2\.4](https://arxiv.org/html/2606.27459#S2.Thmtheorem4)\) to each term and combining with \([3\.1](https://arxiv.org/html/2606.27459#S3.Ex7)\), ‖𝒩\(ψ^\)−𝒩\(ψ^θ\)‖hs\\displaystyle\\bigl\\\|\\mathcal\{N\}\(\\widehat\{\\psi\}\)\-\\mathcal\{N\}\(\\widehat\{\\psi\}^\{\\,\\theta\}\)\\bigr\\\|\_\{h^\{s\}\}≤‖e~P≤Kψ¯P≤Kψ‖Hs\+‖ψθe~¯P≤Kψ‖Hs\+‖ψθψθ¯e~‖Hs\\displaystyle\\leq\\\|\\tilde\{e\}\\,\\overline\{P\_\{\\leq K\}\\psi\}\\,P\_\{\\leq K\}\\psi\\\|\_\{H^\{s\}\}\+\\\|\\psi^\{\\theta\}\\,\\overline\{\\tilde\{e\}\}\\,P\_\{\\leq K\}\\psi\\\|\_\{H^\{s\}\}\+\\\|\\psi^\{\\theta\}\\,\\overline\{\\psi^\{\\theta\}\}\\,\\tilde\{e\}\\\|\_\{H^\{s\}\}\(3\.19\)≤Cs\(‖P≤Kψ‖Hs2\+‖P≤Kψ‖Hs‖ψθ‖Hs\+‖ψθ‖Hs2\)‖e~‖Hs\.\\displaystyle\\leq C\_\{s\}\\Bigl\(\\\|P\_\{\\leq K\}\\psi\\\|\_\{H^\{s\}\}^\{2\}\+\\\|P\_\{\\leq K\}\\psi\\\|\_\{H^\{s\}\}\\\|\\psi^\{\\theta\}\\\|\_\{H^\{s\}\}\+\\\|\\psi^\{\\theta\}\\\|\_\{H^\{s\}\}^\{2\}\\Bigr\)\\\|\\tilde\{e\}\\\|\_\{H^\{s\}\}\.SinceP≤KP\_\{\\leq K\}is a contraction onHsH^\{s\},‖P≤Kψ‖Hs≤‖ψ‖Hs\\\|P\_\{\\leq K\}\\psi\\\|\_\{H^\{s\}\}\\leq\\\|\\psi\\\|\_\{H^\{s\}\}, and moreover \(3\.20\)e~=P≤Kψ−ψθ=P≤K\(ψ−ψθ\)\\displaystyle\\tilde\{e\}=P\_\{\\leq K\}\\psi\-\\psi^\{\\theta\}=P\_\{\\leq K\}\(\\psi\-\\psi^\{\\theta\}\)has Fourier support inΛK\\Lambda\_\{K\}with‖e~‖Hs=‖e^‖hs\\\|\\tilde\{e\}\\\|\_\{H^\{s\}\}=\\\|\\widehat\{e\}\\\|\_\{h^\{s\}\}\. Using‖P≤Kψ‖Hs‖ψθ‖Hs≤12\(‖ψ‖Hs2\+‖ψθ‖Hs2\)\\\|P\_\{\\leq K\}\\psi\\\|\_\{H^\{s\}\}\\\|\\psi^\{\\theta\}\\\|\_\{H^\{s\}\}\\leq\\tfrac\{1\}\{2\}\(\\\|\\psi\\\|\_\{H^\{s\}\}^\{2\}\+\\\|\\psi^\{\\theta\}\\\|\_\{H^\{s\}\}^\{2\}\), absorbing constants into a newCs\>0C\_\{s\}\>0, and invoking \([3\.8](https://arxiv.org/html/2606.27459#S3.E8)\), \(3\.21\)‖𝒩\(ψ^\)−𝒩\(ψ^θ\)‖hs≤Cs\(‖ψ\(t\)‖Hs\+‖ψθ\(t\)‖Hs\)2‖e^\(t\)‖hs≤CsM2‖e^\(t\)‖hs\.\\displaystyle\\bigl\\\|\\mathcal\{N\}\(\\widehat\{\\psi\}\)\-\\mathcal\{N\}\(\\widehat\{\\psi\}^\{\\,\\theta\}\)\\bigr\\\|\_\{h^\{s\}\}\\leq C\_\{s\}\\bigl\(\\\|\\psi\(t\)\\\|\_\{H^\{s\}\}\+\\\|\\psi^\{\\theta\}\(t\)\\\|\_\{H^\{s\}\}\\bigr\)^\{2\}\\\|\\widehat\{e\}\(t\)\\\|\_\{h^\{s\}\}\\leq C\_\{s\}M^\{2\}\\\|\\widehat\{e\}\(t\)\\\|\_\{h^\{s\}\}\. Truncation term\.Each triple definingaka\_\{k\}in \([3\.12](https://arxiv.org/html/2606.27459#S3.E12)\) has at least one factor indexed outsideΛK\\Lambda\_\{K\}, i\.e\. a factor ofP\>KψP\_\{\>K\}\\psi\. By the algebra property \(Lemma[2\.4](https://arxiv.org/html/2606.27459#S2.Thmtheorem4)\) and then the Bernstein inequality \(Lemma[2\.3](https://arxiv.org/html/2606.27459#S2.Thmtheorem3)\), \(3\.22\)‖a\(t\)‖hs≤Cs‖P\>Kψ\(t\)‖Hs‖ψ\(t\)‖Hs2≤CsK‖ψ\(t\)‖Hs\+1‖ψ\(t\)‖Hs2≤CsM3K\.\\displaystyle\\\|a\(t\)\\\|\_\{h^\{s\}\}\\leq C\_\{s\}\\,\\\|P\_\{\>K\}\\psi\(t\)\\\|\_\{H^\{s\}\}\\,\\\|\\psi\(t\)\\\|\_\{H^\{s\}\}^\{2\}\\leq\\frac\{C\_\{s\}\}\{K\}\\,\\\|\\psi\(t\)\\\|\_\{H^\{s\+1\}\}\\,\\\|\\psi\(t\)\\\|\_\{H^\{s\}\}^\{2\}\\leq\\frac\{C\_\{s\}M^\{3\}\}\{K\}\. Differential inequality and Grönwall\.Inserting \([3\.21](https://arxiv.org/html/2606.27459#S3.E21)\) and \([3\.22](https://arxiv.org/html/2606.27459#S3.E22)\) into \([3\.16](https://arxiv.org/html/2606.27459#S3.E16)\) and dividing by‖e^\(t\)‖hs\\\|\\widehat\{e\}\(t\)\\\|\_\{h^\{s\}\}where positive, \(3\.23\)ddt‖e^\(t\)‖hs≤CsM2‖e^\(t\)‖hs\+CsM3K\+‖rθ\(t\)‖hs\.\\displaystyle\\frac\{d\}\{dt\}\\\|\\widehat\{e\}\(t\)\\\|\_\{h^\{s\}\}\\leq C\_\{s\}M^\{2\}\\\|\\widehat\{e\}\(t\)\\\|\_\{h^\{s\}\}\+\\frac\{C\_\{s\}M^\{3\}\}\{K\}\+\\\|r^\{\\theta\}\(t\)\\\|\_\{h^\{s\}\}\.Grönwall’s inequality gives, for everyt∈\[0,T\]t\\in\[0,T\], \(3\.24\)‖e^\(t\)‖hs≤exp\(CsM2t\)\[‖e^\(0\)‖hs\+∫0t\(‖rθ\(τ\)‖hs\+CsM3K\)𝑑τ\]\.\\displaystyle\\\|\\widehat\{e\}\(t\)\\\|\_\{h^\{s\}\}\\leq\\exp\(C\_\{s\}M^\{2\}t\)\\left\[\\\|\\widehat\{e\}\(0\)\\\|\_\{h^\{s\}\}\+\\int\_\{0\}^\{t\}\\Bigl\(\\\|r^\{\\theta\}\(\\tau\)\\\|\_\{h^\{s\}\}\+\\frac\{C\_\{s\}M^\{3\}\}\{K\}\\Bigr\)d\\tau\\right\]\.Sincee=ψ−ψθe=\\psi\-\\psi^\{\\theta\}and∥⋅∥hs=∥⋅∥Hs\\\|\\cdot\\\|\_\{h^\{s\}\}=\\\|\\cdot\\\|\_\{H^\{s\}\}under the Fourier identification, taking the supremum over0≤t≤T0\\leq t\\leq Tyields \(3\.25\)sup0≤t≤T‖ψ\(t\)−ψθ\(t\)‖Hs≤exp\(CsM2T\)\[‖ψ\(0\)−ψθ\(0\)‖Hs\+∫0T‖rθ\(τ\)‖hs𝑑τ\+CsM3TK\],\\displaystyle\\sup\_\{0\\leq t\\leq T\}\\\|\\psi\(t\)\-\\psi^\{\\theta\}\(t\)\\\|\_\{H^\{s\}\}\\leq\\exp\(C\_\{s\}M^\{2\}T\)\\left\[\\\|\\psi\(0\)\-\\psi^\{\\theta\}\(0\)\\\|\_\{H^\{s\}\}\+\\int\_\{0\}^\{T\}\\\|r^\{\\theta\}\(\\tau\)\\\|\_\{h^\{s\}\}\\,d\\tau\+\\frac\{C\_\{s\}M^\{3\}T\}\{K\}\\right\],which proves the result\. ∎ ## 4\.Numerical Experiments In this section, we present numerical experiments for the two\-dimensional cubic defocusing NLS on rational and irrational tori\. We first describe the data generation procedure, computational setup, model training, and evaluation metrics\. We then compare the learned and ground truth solutions, with particular attention to the growth of Sobolev norms, which provides the main numerical evidence supporting the theoretical discussion in Section[2](https://arxiv.org/html/2606.27459#S2)\. Finally, we include ablation studies to examine how architectural choices affect the accuracy of the learned dynamics\. Figure 4\.1\.Training and validation relativeL2L\_\{2\}error as functions of epoch### 4\.1\.Data Generation and Computational Setup The reference data are generated on the computational domain\[0,2π\]2\[0,2\\pi\]^\{2\}, with the torus geometry entering through the anisotropic Fourier multiplierλm,ℓ=m2\+ω2ℓ2\.\\lambda\_\{m,\\ell\}=m^\{2\}\+\\omega^\{2\}\\ell^\{2\}\.We consider the rational caseω2=1\\omega^\{2\}=1and the irrational caseω2=2\\omega^\{2\}=\\sqrt\{2\}\. For each realization, the initial condition is constructed as a random\-phase Fourier series supported on the low\-frequency set K0=\{\(m,ℓ\)∈ℤ2:−2≤m,ℓ≤2,\(m,ℓ\)≠\(0,0\)\}\.K\_\{0\}=\\\{\(m,\\ell\)\\in\\mathbb\{Z\}^\{2\}:\-2\\leq m,\\ell\\leq 2,\\ \(m,\\ell\)\\neq\(0,0\)\\\}\.Specifically, ψ0\(x,y\)=∑\(m,ℓ\)∈K0Ceiϕm,ℓei\(mx\+ℓy\),\\psi\_\{0\}\(x,y\)=\\sum\_\{\(m,\\ell\)\\in K\_\{0\}\}Ce^\{i\\phi\_\{m,\\ell\}\}e^\{i\(mx\+\\ell y\)\},where the phases,ϕm,ℓ∼𝒰\[0,2π\],\\phi\_\{m,\\ell\}\\sim\\mathcal\{U\}\[0,2\\pi\],are sampled independently\. The constantCCis chosen so that the initial data have the prescribed anisotropic Sobolev size‖ψ0‖Hs=R\\\|\\psi\_\{0\}\\\|\_\{H^\{s\}\}=R\. As in[hrabski2021energy](https://arxiv.org/html/2606.27459#bib.bib13), we takes=2s=2andR=1\.8263R=1\.8263\. The reference solutions are computed using a Fourier pseudo\-spectral method in space and an integrating\-factor fourth\-order Runge–Kutta method in time\. The linear part is advanced exactly in Fourier space, whereas the cubic nonlinear term\|ψ\|2ψ\|\\psi\|^\{2\}\\psiis evaluated pseudospectrally in physical space\. A two\-thirds dealiasing rule[Orszag1971Dealiasing](https://arxiv.org/html/2606.27459#bib.bib26)is applied to the nonlinear term\. Unless stated otherwise, the reference solutions are generated on a256×256256\\times 256spatial grid with time stepΔtref=2×10−3\\Delta t\_\{\\mathrm\{ref\}\}=2\\times 10^\{\-3\}\. These solutions are then downsampled to a64×6464\\times 64learning grid in which all training, validation, testing, and error computations are performed\. The solution is stored at time intervalsΔtdata=Tf/40,Tf=2π,\\Delta t\_\{\\mathrm\{data\}\}=T\_\{f\}/40,\\,T\_\{f\}=2\\pi,and the final time isTmax=10TfT\_\{\\max\}=10T\_\{f\}\. Hence, each trajectory contains400400one\-step intervals\. The training data are formed from consecutive pairs of snapshots\. For each time levelnn, the input consists of the real part, the imaginary part, and a constant channel containing the value ofω2\\omega^\{2\}, a\(x,y\)=\(Reψn\(x,y\),Imψn\(x,y\),ω2\),a\(x,y\)=\\left\(\\operatorname\{Re\}\\psi^\{n\}\(x,y\),\\operatorname\{Im\}\\psi^\{n\}\(x,y\),\\omega^\{2\}\\right\),and the target is u\(x,y\)=\(Reψn\+1\(x,y\),Imψn\+1\(x,y\)\)\.u\(x,y\)=\\left\(\\operatorname\{Re\}\\psi^\{n\+1\}\(x,y\),\\operatorname\{Im\}\\psi^\{n\+1\}\(x,y\)\\right\)\.Therefore, the learned map is a geometry\-conditioned one\-step solution operator,𝒥θ:𝒜⟼𝒰\\mathcal\{J\}\_\{\\theta\}:\\mathcal\{A\}\\longmapsto\\mathcal\{U\}\. Long\-time predictions are then obtained by autoregressively feeding the predicted state back into the model\. #### 4\.1\.1\.Dataset split The data are split according to the random initial conditions determined by the sampled phases\{ϕm,ℓ\}\\\{\\phi\_\{m,\\ell\}\\\}\. Thus, all one\-step pairs from the trajectory generated by a given set of phases are assigned to only one of the training, validation, or testing sets\. In our experiment, we use12001200such samples for each geometry \(ω2∈\{1,2\}\\omega^\{2\}\\in\\\{1,\\sqrt\{2\}\\\}\), partitioned as800800for training,200200for validation, and200200for testing\. The model is then trained on the resulting one\-step pairs from both geometries\. The real and imaginary components are standardized using statistics computed only from the training set\. Theω2\\omega^\{2\}channel is kept unnormalized so that the model receives the physical value of the torus parameter directly\. #### 4\.1\.2\.Training and Evaluation Our architecture uses 4 Fourier layers, width6464, and1212retained Fourier modes in each spatial direction\. The lifting mapPPsendsa\(x\)a\(x\)to a6464\-channel latent representation\. Each Fourier layer consists of a truncated spectral convolution and a pointwise1×11\\times 1convolution in physical space, followed by a GELU activation\. The projection mapQQsends the final latent representation through128128channels to the two\-component outputu\(x\)u\(x\)\. Training uses Adam with initial learning rate10−310^\{\-3\}, mini\-batch size1616, and mean squared error loss on the normalized real and imaginary components\. We train for100100epochs and reduce the learning rate by a factor of0\.50\.5every2525epochs\. The checkpoint with the smallest validation loss is used for testing\. Test performance is reported from full autoregressive rollouts on unseen realizations using relativeL2L^\{2\}error forψ\\psi, comparisons of\|ψ\|\|\\psi\|, solution slices, and the Sobolev norm growth‖ψ\(t\)‖H2\\\|\\psi\(t\)\\\|\_\{H^\{2\}\}\. We conduct all our experiments on a single NVIDIA A40 GPU with4848GB memory, using88CPU cores and128128GB RAM\. ### 4\.2\.Numerical Results In this section, we present the numerical results for the learned one\-step solution operator on the rational and irrational tori\. We include training and validation diagnostics, relative error comparisons, solution\-level visualizations, and Sobolev\-norm plots\. \(a\) \(b\) \(c\) Figure 4\.2\.Network testing\. \(a\) RelativeL2L\_\{2\}error of the solution obtained on the rational tori in the test set; \(b\) RelativeL2L\_\{2\}error of the solution obtained on the irrational tori in the test set; \(c\) Polar representation of the test\-set relativeL2L\_\{2\}errors\.\(a\) \(b\) Figure 4\.3\.Snapshots of\|ψ\|\|\\psi\|on rational and irrational tori\. For both cases \(a\) and \(b\), the top row shows the ground truth solution and the bottom row shows the prediction att/Tf=1,4,7,10t/T\_\{f\}=1,4,7,10\.In Figure[4\.1](https://arxiv.org/html/2606.27459#S4.F1), the training and validation errors decrease steadily over the epochs, with the validation curve closely following the training curve\. This suggests that the model learns the one\-step solution operator without a clear sign of overfitting\. The sample\-wise relativeL2L^\{2\}errors in Figures[4\.2](https://arxiv.org/html/2606.27459#S4.F2)\(a\) and[4\.2](https://arxiv.org/html/2606.27459#S4.F2)\(b\) remain on the order of10−210^\{\-2\}across the test set for both the rational and irrational tori, showing that the learned operator gives consistent accuracy over unseen initial conditions\. Figure[4\.2](https://arxiv.org/html/2606.27459#S4.F2)\(c\) gives a compact polar representation of test\-set errors\. For each test samplej=1,…,200j=1,\\ldots,200, we plotzj=rjeiϑj,z\_\{j\}=r\_\{j\}e^\{i\\vartheta\_\{j\}\},where the radiusrjr\_\{j\}is defined as the relativeL2L\_\{2\}error, and the angleϑ=2π\(j−1\)/200\\vartheta=\{2\\pi\(j\-1\)\}/200indexes the test samples uniformly around the circle\. Thus, points closer to the center correspond to smaller errors, while points farther from the center correspond to larger errors\. The rational case shows a wider radial spread, whereas the irrational case is more concentrated toward smaller errors\. \(a\) \(b\) Figure 4\.4\.Comparison chart of the ground truth and predicted solutions for\|ψ\|\|\\psi\|att/Tf=2t/T\_\{f\}=2on rational in \(a\) and irrational tori in \(b\)\.Figure[4\.3](https://arxiv.org/html/2606.27459#S4.F3)shows that the predicted solution agrees closely with the ground truth on both the rational and irrational tori\. The learned solution operator captures the main spatial patterns and their evolution without noticeable distortion in the plotted time window\. In Figure[4\.4](https://arxiv.org/html/2606.27459#S4.F4), we plot one\-dimensional slice profiles of\|ψ\|\|\\psi\|att/Tf=2t/T\_\{f\}=2against the grid index on the64×6464\\times 64data grid, which further confirms the agreement, since the predicted and ground truth curves are nearly indistinguishable for both geometries\. Figure 4\.5\.The growth of‖ψ‖H2\\\|\\psi\\\|\_\{H^\{2\}\}in the rational and irrational tori\.In Figure[4\.5](https://arxiv.org/html/2606.27459#S4.F5), we plot the Sobolev norm growth computed from the learned operator for three test realizations from each torus geometry\. The rational case\(ω2=1\)\(\\omega^\{2\}=1\), shown with dashed curves, exhibits stronger growth in‖ψ\(t\)‖H2\\\|\\psi\(t\)\\\|\_\{H^\{2\}\}over the time interval, while the irrational case\(ω2=2\)\(\\omega^\{2\}=\\sqrt\{2\}\), shown with solid curves, shows more constrained growth and remains lower overall\. This behavior is consistent with the theoretical discussion in Section[2](https://arxiv.org/html/2606.27459#S2), where the irrational geometry restricts the resonance structure and leads to weaker transfer of energy to higher Fourier modes\. In particular, the numerical result is consistent with Theorem[2\.6](https://arxiv.org/html/2606.27459#S2.Thmtheorem6)and Corollary[2\.7](https://arxiv.org/html/2606.27459#S2.Thmtheorem7), which indicate that the growth of the Sobolev norm is more constrained on sufficiently irrational tori\. Thus, the model captures the geometry\-dependent spectral behavior predicted by the analysis\. ### 4\.3\.Ablation Study of Critical Factors in the FNO Scheme To assess the sensitivity of our operator to key architectural choices, we perform ablation studies on the number of retained Fourier modes, the nonlinear activation function, the Fourier\-layer depth, and the use of explicit geometry conditioning\. For a test sample indexed byii, letψin\\psi\_\{i\}^\{n\}denote the ground truth solution at the time levelnn, and letψi,θn\\psi\_\{i,\\theta\}^\{n\}denote the corresponding prediction\. The relativeL2L^\{2\}error is defined by ℰi=\(∑n=1M∑j,k\|ψi,θn\(xj,yk\)−ψin\(xj,yk\)\|2∑n=1M∑j,k\|ψin\(xj,yk\)\|2\)1/2\.\\mathcal\{E\}\_\{i\}=\\left\(\\frac\{\\displaystyle\\sum\_\{n=1\}^\{M\}\\sum\_\{j,k\}\\left\|\\psi\_\{i,\\theta\}^\{n\}\(x\_\{j\},y\_\{k\}\)\-\\psi\_\{i\}^\{n\}\(x\_\{j\},y\_\{k\}\)\\right\|^\{2\}\}\{\\displaystyle\\sum\_\{n=1\}^\{M\}\\sum\_\{j,k\}\\left\|\\psi\_\{i\}^\{n\}\(x\_\{j\},y\_\{k\}\)\\right\|^\{2\}\}\\right\)^\{1/2\}\.For each ablation setting, we computeℰi\\mathcal\{E\}\_\{i\}over all test realizations and report Worst=max1≤i≤Nℰi,Mean=1N∑i=1Nℰi,Best=min1≤i≤Nℰi,\\mathrm\{Worst\}=\\max\_\{1\\leq i\\leq N\}\\mathcal\{E\}\_\{i\},\\quad\\mathrm\{Mean\}=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\mathcal\{E\}\_\{i\},\\quad\\mathrm\{Best\}=\\min\_\{1\\leq i\\leq N\}\\mathcal\{E\}\_\{i\},whereNNis the total number of test samples\. These values summarize the least accurate, average, and most accurate test predictions over the full prediction interval for each model configuration\. #### 4\.3\.1\.The influence of the number of Fourier modes In this section, we investigate on how the number of retained Fourier modes affects the accuracy of the learned model\. In the FNO architecture, the spectral convolution is computed only on a truncated set of low Fourier modes, while the remaining high\-frequency modes are not directly parameterized in Fourier space\. Thus, the value ofKKcontrols how much spectral information is retained in each Fourier layer\. In this experiment, we keep the network width, number of Fourier layers, activation function, training data, and optimizer fixed, and vary only the number of retained Fourier modes\. Table 4\.1\.The relativeL2L\_\{2\}error corresponding to different Fourier modes,KK\.Table[4\.1](https://arxiv.org/html/2606.27459#S4.T1)shows that increasing the number of retained Fourier modes does not lead to a monotone improvement in accuracy\. The best mean error is obtained withK=12K=12for both geometries, with mean errors0\.03660\.0366for the irrational torus and0\.06250\.0625for the rational torus\. The caseK=8K=8is competitive for the irrational geometry, but gives a noticeably larger error for the rational case\. Increasing toK=16K=16worsens the performance for both geometries, whileK=20K=20improves overK=16K=16but still does not outperformK=12K=12\. This suggests that retaining too few modes may limit the model’s ability to represent the solution dynamics, while retaining too many modes can increase the number of trainable spectral parameters and make optimization less effective for the fixed training setup\. Altogether,K=12K=12gives the best balance between spectral resolution and stable learning in this experiment\. #### 4\.3\.2\.The influence of nonlinear activation functions We examine the effect of the nonlinear activation function, with the retained Fourier modes fixed atK=12K=12\. The nonlinear activation function affects how the network combines local and spectral features across the network layers\. It allows the model to represent nonlinear solution behavior beyond a purely linear transformation of the input modes\. Its derivative also influences gradient propagation during training, which can affect both convergence and long\-time stability\. For this reason, the choice of activation function can have a noticeable effect on the accuracy of the learned solution operator\. Here, we compare Sigmoid, GELU, Tanh, Swish, and ReLU activations using the same training and testing setup\. Table 4\.2\.The relativeL2L\_\{2\}error corresponding to different nonlinear activation functions\.From Table[4\.2](https://arxiv.org/html/2606.27459#S4.T2), we observe that Sigmoid gives the smallest mean error for both geometries, with mean errors of0\.02250\.0225forω2=2\\omega^\{2\}=\\sqrt\{2\}and0\.02940\.0294forω2=1\\omega^\{2\}=1\. GELU gives the next best performance, with mean errors of0\.03660\.0366and0\.06250\.0625for the irrational and rational geometries, respectively\. Tanh performs slightly worse than GELU but remains more accurate than Swish and ReLU\. ReLU is especially less accurate on the rational torus, where the mean error increases to0\.21530\.2153\. Taken together, Sigmoid gives the most accurate activation choice in this experiment\. #### 4\.3\.3\.The influence of Fourier\-layer depth The Fourier layers are the main spectral components of the FNO architecture\. They map the input representation into Fourier space, apply trainable weights to a fixed number of retained modes, and return the result to physical space\. Their depth therefore determines how many times the network processes and recombines spectral information before producing the next predicted state\. To examine how this affects the model performance, we vary the number of Fourier layers while keeping the Fourier modes, activation function, training data, and optimization settings fixed\. The depths considered are22,44,66,88, and1010\. Table 4\.3\.The relativeL2L\_\{2\}error corresponding to different Fourier\-layer depths\.The prediction accuracy for different Fourier\-layer depths is shown in Table[4\.3](https://arxiv.org/html/2606.27459#S4.T3)\. The results show that Fourier\-layer depth has a clear effect on performance, but the trend is not monotone\. The two\-layer model gives the smallest mean error for both geometries, with mean errors of0\.01180\.0118forω2=2\\omega^\{2\}=\\sqrt\{2\}and0\.03720\.0372forω2=1\\omega^\{2\}=1\. The four\-layer model gives the next best performance, with mean errors of0\.03660\.0366and0\.06250\.0625for the irrational and rational geometries, respectively\. Increasing the depth further does not improve the accuracy\. In particular, the six\-layer model gives the largest mean error for both geometries, while the eight\- and ten\-layer models improve upon the six\-layer case but still do not outperform the two\- or four\-layer architectures\. Thus, although additional Fourier layers increase the spectral processing capacity of the network, deeper architectures do not necessarily yield better long\-time predictions under the present training setup\. These results indicate that Fourier\-layer depth must be selected carefully, since excessive depth may increase optimization difficulty and error accumulation\. #### 4\.3\.4\.The influence of geometry conditioning In this section, we compare the geometry\-conditioned model 𝒥θ:\(Reψn\(x,y\),Imψn\(x,y\),ω2\)⟼\(Reψn\+1\(x,y\),Imψn\+1\(x,y\)\),\\mathcal\{J\}\_\{\\theta\}:\\left\(\\operatorname\{Re\}\\psi^\{n\}\(x,y\),\\operatorname\{Im\}\\psi^\{n\}\(x,y\),\\omega^\{2\}\\right\)\\longmapsto\\left\(\\operatorname\{Re\}\\psi^\{n\+1\}\(x,y\),\\operatorname\{Im\}\\psi^\{n\+1\}\(x,y\)\\right\),with an unconditioned model 𝒥θ:\(Reψn\(x,y\),Imψn\(x,y\)\)⟼\(Reψn\+1\(x,y\),Imψn\+1\(x,y\)\)\.\\mathcal\{J\}\_\{\\theta\}:\\left\(\\operatorname\{Re\}\\psi^\{n\}\(x,y\),\\operatorname\{Im\}\\psi^\{n\}\(x,y\)\\right\)\\longmapsto\\left\(\\operatorname\{Re\}\\psi^\{n\+1\}\(x,y\),\\operatorname\{Im\}\\psi^\{n\+1\}\(x,y\)\\right\)\. Table 4\.4\.The relativeL2L\_\{2\}error for geometry\-conditioned and unconditioned models\.We investigate whether the learned solution operator can infer the underlying geometry from solution snapshots alone, or whether providing the torus parameterω2\\omega^\{2\}improves the accuracy of the model\. In the conditioned case, the input includes an additional constant channel containing the value ofω2\\omega^\{2\}, so that a single network can explicitly distinguish between the rational and irrational geometries\. In the unconditioned case, this channel is removed, and the network only receives the real and imaginary parts of the solution\. \(a\)Training relative error\. \(b\)Validation relative error\. Figure 4\.6\.Training and validation relative errors for the geometry\-conditioned and unconditioned models withK=12K=12, GELU activation, and four Fourier layers\.Figures[6\(a\)](https://arxiv.org/html/2606.27459#S4.F6.sf1)–[6\(b\)](https://arxiv.org/html/2606.27459#S4.F6.sf2)show that both the conditioned and unconditioned models train stably\. The training curves are nearly indistinguishable, whereas the validation curves are noisier but decay to similar magnitudes by the end of training\. However, the errors in Table[4\.4](https://arxiv.org/html/2606.27459#S4.T4)show a clearer benefit from geometry conditioning\. For the irrational torus, the conditioned model reduces the mean error from0\.04670\.0467to0\.03660\.0366\. For the rational torus, the improvement is larger, with the mean error decreasing from0\.10340\.1034to0\.06250\.0625\. The worst and best errors also decrease under conditioning for both geometries\. Thus, adding theω2\\omega^\{2\}channel helps the model distinguish the two geometric formulations and gives a more accurate learned solution operator\. The improvement is especially important for the rational torus, where the unconditioned model has noticeably larger error\. ## 5\.Conclusion In this work, we investigated the use of Fourier neural operators for learning the geometry\-dependent dynamics of the two\-dimensional cubic defocusing nonlinear Schrödinger equation on rational and irrational tori\. The main motivation was that the torus aspect ratio parameterω2\\omega^\{2\}changes the dispersion relationλm,ℓ=m2\+ω2ℓ2,\\lambda\_\{m,\\ell\}=m^\{2\}\+\\omega^\{2\}\\ell^\{2\},and therefore changes the resonance and quasi\-resonance structure of the Fourier lattice\. As a result, the rational and irrational geometries can exhibit different rates of energy transfer to high Fourier modes\. Our numerical experiments show that the learned operator captures the solution dynamics on unseen random\-phase initial data and reproduces the different Sobolev\-norm behavior observed on the two geometries\. The ablation studies further clarify how architectural choices affect the accuracy of the learned dynamics\. The scope of the present study is shaped by the fact that the model is trained entirely on data generated by a numerical solver, so the quality of the learned operator depends on the accuracy, resolution, and time horizon of the numerical data\. In addition, the experiments are performed on a fixed learning grid and over a finite prediction interval, so resolution transfer and substantially longer\-time generalization are left for future work\. The computational experiments were also limited by the available GPU resources and wall\-time limits on the computing cluster, which affected the number of training runs, ablation configurations, grid resolutions, and long\-time prediction experiments that could be included\. The results also leave open the development of resonance\-aware neural operators\. In the present FNO architecture, the retained Fourier modes are learned through generic spectral weights\. A resonance\-aware architecture would instead use the dispersion relation of the underlying PDE to identify resonant or nearly resonant Fourier\-mode interactions and incorporate this information into the model architecture\. By aligning the architecture more closely with the mechanism that drives spectral energy transfer, such models may improve predictive accuracy and long\-time stability\. The present study also leaves the more realistic waveguide settingℝ×𝕋2\\mathbb\{R\}\\times\\mathbb\{T\}^\{2\}for future investigation\. The cubic NLS onℝ×𝕋d\\mathbb\{R\}\\times\\mathbb\{T\}^\{d\}has been mathematically investigated through modified scattering[WilsonYu2022](https://arxiv.org/html/2606.27459#bib.bib35);[HaniPausaderTzvetkovVisciglia2015](https://arxiv.org/html/2606.27459#bib.bib10), resonant dynamics, and Sobolev\-norm growth[planchon2017growth](https://arxiv.org/html/2606.27459#bib.bib27), but the corresponding geometry\-aware neural\-operator problem appears to remain largely unexplored\. ## References - \[1\]W\. Bao\.The nonlinear Schrödinger equation and applications in Bose\-Einstein condensation and plasma physics\.InDynamics in models of coarsening, coagulation, condensation and quantization, pages 141–239\. World Scientific, 2007\. - \[2\]W\. Bao, S\. Jin, and P\. A\. Markowich\.Numerical Study of Time\-Splitting Spectral Discretizations of Nonlinear Schrödinger Equations in the Semiclassical Regimes\.SIAM Journal on Scientific Computing, 25\(1\):27–64, 2003\. - \[3\]J\. Bourgain\.On the growth in time of higher Sobolev norms of smooth solutions of Hamiltonian PDE\.International Mathematics Research Notices, \(6\):277–304, 1996\. - \[4\]J\. Colliander, M\. Keel, G\. Staffilani, H\. Takaoka, and T\. Tao\.Transfer of energy to high frequencies in the cubic defocusing nonlinear Schrödinger equation\.Inventiones Mathematicae, 181\(1\):39–113, 2010\. - \[5\]M\. Cranmer, S\. Greydanus, S\. Hoyer, P\. Battaglia, D\. Spergel, and S\. Ho\.Lagrangian Neural Networks\.arXiv preprint arXiv:2003\.04630, 2020\. - \[6\]Y\. Deng and P\. Germain\.Growth of Solutions to NLS on Irrational Tori\.International Mathematics Research Notices, 2019\(9\):2919–2950, 2019\. - \[7\]F\. Giuliani and M\. Guardia\.Sobolev norms explosion for the cubic NLS on irrational tori\.Nonlinear Analysis, 220:112865, 2022\. - \[8\]Y\. Gong, Q\. Wang, Y\. Wang, and J\. Cai\.A conservative Fourier pseudo\-spectral method for the nonlinear Schrödinger equation\.Journal of Computational Physics, 328:354–370, 2017\. - \[9\]S\. Greydanus, M\. Dzamba, and J\. Yosinski\.Hamiltonian Neural Networks\.InAdvances in Neural Information Processing Systems, volume 32, 2019\. - \[10\]Z\. Hani, B\. Pausader, N\. Tzvetkov, and N\. Visciglia\.Modified scattering for the cubic Schrödinger equation on product spaces and applications\.Forum of Mathematics, Pi, 3:e4, 2015\. - \[11\]Q\. Hernández, A\. Badias, D\. González, F\. Chinesta, and E\. Cueto\.Structure\-preserving neural networks\.Journal of Computational Physics, 426:109950, 2021\. - \[12\]S\. Herr and B\. Kwak\.Global well\-posedness of the cubic nonlinear schrödinger equation on $\\\\backslashmathbb\{t\} ˆ\{2\}$\.Inventiones mathematicae, 2026\. - \[13\]A\. Hrabski, Y\. Pan, G\. Staffilani, and B\. Wilson\.Energy transfer for solutions to the nonlinear Schröodinger equation on irrational tori\.arXiv preprint arXiv:2107\.01459, 2021\. - \[14\]A\. D\. Jagtap and G\. E\. Karniadakis\.Extended physics\-informed neural networks \(xpinns\): A generalized space\-time domain decomposition based deep learning framework for nonlinear partial differential equations\.Communications in Computational Physics, 28\(5\):2002–2041, 2020\. - \[15\]A\. D\. Jagtap, E\. Kharazmi, and G\. E\. Karniadakis\.Conservative physics\-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems\.Computer Methods in Applied Mechanics and Engineering, 365:113028, 2020\. - \[16\]P\. Jin, Z\. Zhang, A\. Zhu, Y\. Tang, and G\. E\. Karniadakis\.Sympnets: Intrinsic structure\-preserving symplectic networks for identifying hamiltonian systems\.Neural Networks, 132:166–179, 2020\. - \[17\]N\. Kovachki, S\. Lanthaler, and S\. Mishra\.On universal approximation and error bounds for Fourier neural operators\.Journal of Machine Learning Research, 22\(290\):1–76, 2021\. - \[18\]N\. Kovachki, Z\. Li, B\. Liu, K\. Azizzadenesheli, K\. Bhattacharya, A\. M\. Stuart, and A\. Anandkumar\.Neural Operator: Learning Maps Between Function Spaces with Applications to PDEs\.Journal of Machine Learning Research, 24\(89\):1–97, 2023\. - \[19\]Z\. Li, N\. B\. Kovachki, K\. Azizzadenesheli, B\. Liu, K\. Bhattacharya, A\. M\. Stuart, and A\. Anandkumar\.Neural Operator: Graph Kernel Network for Partial Differential Equations\.arXiv preprint arXiv:2003\.03485, 2020\. - \[20\]Z\. Li, N\. B\. Kovachki, K\. Azizzadenesheli, B\. Liu, K\. Bhattacharya, A\. M\. Stuart, and A\. Anandkumar\.Fourier Neural Operator for Parametric Partial Differential Equations\.InInternational Conference on Learning Representations, 2021\. - \[21\]L\. Lu, P\. Jin, G\. Pang, Z\. Zhang, and G\. E\. Karniadakis\.Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators\.Nature Machine Intelligence, 3\(3\):218–229, 2021\. - \[22\]A\. C\. Newell and J\. V\. Moloney\.Nonlinear Optics\.Addison\-Wesley, 1992\. - \[23\]V\. Obieke and E\. Oguadimma\.Structure\-preserving physics\-informed neural network for the Korteweg–de Vries \(KdV\) equation\.arXiv preprint arXiv:2511\.00418, 2025\. - \[24\]V\. C\. Obieke, C\. Chukwuemeka, and E\. E\. Oguadimma\.Structure\-informed neural operators for long\-time prediction of parametric hamiltonian pdes\.arXiv preprint arXiv:2606\.14913, 2026\. - \[25\]E\. E\. Oguadimma, M\. A\. F\. Elbarkawy, D\. O\. Oranugo, H\. E\. Salem, M\. Bayram, and O\. J\. Obulezi\.A Foundational Review of Ordinary Differential Equation Solution Methods and Their Inherent Symmetries\.Boletim da Sociedade Paranaense de Matemática, 44\(8\):1–27, 2026\. - \[26\]S\. A\. Orszag\.On the Elimination of Aliasing in Finite\-Difference Schemes by Filtering High\-Wavenumber Components\.Journal of the Atmospheric Sciences, 28\(6\):1074–1074, 1971\. - \[27\]F\. Planchon, N\. Tzvetkov, and N\. Visciglia\.On the Growth of Sobolev norms for NLS on 2\- and 3\-Dimensional Manifolds\.Analysis & PDE, 10\(5\):1123–1147, 2017\. - \[28\]M\. Raissi, P\. Perdikaris, and G\. E\. Karniadakis\.Physics\-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations\.Journal of Computational Physics, 378:686–707, 2019\. - \[29\]H\.\-M\. Ren and S\.\-F\. Tian\.The evolution process of the solutions for the coupled nonlinear Schrödinger equations via Fourier neural operator approach\.Mathematics and Computers in Simulation, 239:1082–1096, 2026\. - \[30\]J\. Sirignano and K\. Spiliopoulos\.DGM: A deep learning algorithm for solving partial differential equations\.Journal of Computational Physics, 375:1339–1364, 2018\. - \[31\]G\. Staffilani and B\. Wilson\.Stability of the cubic nonlinear Schrödinger equation on an irrational torus\.SIAM Journal on Mathematical Analysis, 52\(2\):1318–1342, 2020\. - \[32\]C\. Sulem and P\.\-L\. Sulem\.The Nonlinear Schrödinger Equation: Self\-Focusing and Wave Collapse, volume 139 ofApplied Mathematical Sciences\.Springer, 1999\. - \[33\]J\. A\. C\. Weideman and B\. M\. Herbst\.Split\-Step Methods for the Solution of the Nonlinear Schrödinger Equation\.SIAM Journal on Numerical Analysis, 23\(3\):485–507, 1986\. - \[34\]G\. Wen, Z\. Li, K\. Azizzadenesheli, A\. Anandkumar, and S\. M\. Benson\.U\-FNO—An enhanced fourier neural operator\-based deep\-learning model for multiphase flow\.Advances in Water Resources, 163:104180, 2022\. - \[35\]B\. Wilson and X\. Yu\.Modified Scattering of Cubic Nonlinear Schrödinger Equation on Rescaled Waveguide Manifolds\.arXiv preprint arXiv:2207\.07248, 2022\. - \[36\]W\. Xiao, T\. Gao, K\. Liu, J\. Duan, and M\. Zhao\.Fourier neural operator based fluid–structure interaction for predicting the vesicle dynamics\.Physica D: Nonlinear Phenomena, 463:134145, 2024\. - \[37\]H\. You, Q\. Zhang, C\. J\. Ross, C\.\-H\. Lee, and Y\. Yu\.Learning deep implicit Fourier neural operators \(IFNOs\) with applications to heterogeneous material modeling\.Computer Methods in Applied Mechanics and Engineering, 398:115296, 2022\. - \[38\]B\. Yu\.The deep Ritz method: A deep learning\-based numerical algorithm for solving variational problems\.Communications in Mathematics and Statistics, 6\(1\):1–12, 2018\. - \[39\]V\. E\. Zakharov\.Stability of Periodic Waves of Finite Amplitude on the Surface of a Deep Fluid\.Journal of Applied Mechanics and Technical Physics, 9\(2\):190–194, 1968\. - \[40\]Y\. Zang, G\. Bao, X\. Ye, and H\. Zhou\.Weak adversarial networks for high\-dimensional partial differential equations\.Journal of Computational Physics, 411:109409, 2020\. - \[41\]M\. Zhong, Z\. Yan, and S\.\-F\. Tian\.Data\-driven parametric soliton\-rogon state transitions for nonlinear wave equations using deep learning with Fourier neural operator\.Communications in Theoretical Physics, 75:025001, 2023\. - \[42\]Y\. D\. Zhong, B\. Dey, and A\. Chakraborty\.Symplectic ODE\-Net: Learning Hamiltonian Dynamics with Control\.InInternational Conference on Learning Representations, 2020\.
Similar Articles
SirenFNO: Efficient and Full Frequency Learning of Fourier Neural Operators
SirenFNO leverages sinusoidal representation networks to learn full-frequency Fourier kernels, eliminating frequency truncation and achieving significant parameter reductions while improving accuracy on PDE benchmarks.
Frequency Bias and OOD Generalization in Neural Operators under a Variable-Coefficient Wave Equation
This paper investigates the generalization behavior of Fourier Neural Operators and Deep Operator Networks under distribution shifts in a variable-coefficient wave equation, revealing that FNO struggles with high-frequency inputs while DeepONet shows milder degradation.
Sequential Physics-Constrained Neural Operator Forward Modeling for the $\textit{Norne}$ Reservoir System
This paper presents a comprehensive mathematical framework for sequential surrogate modeling of three-phase black-oil reservoir dynamics using Fourier Neural Operators (FNO) and physics-informed variants (PINO), applied to the Norne benchmark reservoir. Theoretical contributions include functional-analytic formulation, covariate shift analysis, physics-constrained spectral stability, and truncated backpropagation gradient analysis.
@AnimaAnandkumar: Great to see extrapolation success with FNOs.
Fourier neural operators (FNOs) achieve extrapolation success in modeling periodically driven quantum systems, capturing temporal correlations in frequency space for physically faithful dynamics beyond training data.
LFNO: Bridging Laplace and Fourier via Transient-Steady Decomposition
LFNO is a unified neural operator framework that integrates Laplace and Fourier transforms to decompose system dynamics into transient and steady-state components, significantly outperforming existing operators on ODE and PDE benchmarks.