Curriculum Learning of Physics-Informed Neural Networks based on Spatial Correlation
Summary
This paper proposes a spatially correlated curriculum learning framework for Physics-Informed Neural Networks (PINNs) that improves training stability and solution accuracy by leveraging spatial correlations among subregions, addressing issues like high-dimensional non-convex loss landscapes and imbalanced multi-objective constraints.
View Cached Full Text
Cached at: 05/18/26, 06:39 AM
# Curriculum Learning of Physics-Informed Neural Networks based on Spatial Correlation
Source: [https://arxiv.org/html/2605.15254](https://arxiv.org/html/2605.15254)
\\emails
chenxj20@mails\.tsinghua\.edu\.cn\(X\. Chen\)
Xinyue HuLetian ChenDaming Shi and Wenhui FanDepartment of Automation, Tsinghua University, Beijing 100084, P\.R\. China
###### Abstract
Physics\-Informed Neural Networks \(PINNs\) combine deep learning with physical constraints for solving partial differential equations \(PDEs\), and are widely applied in fluid mechanics, heat transfer, and solid mechanics\. However, PINN training still suffers from high\-dimensional non\-convex loss landscapes, imbalanced multi\-objective constraints, and ineffective information propagation\. Existing curriculum learning and causality\-guided strategies improve training stability, but mainly focus on temporal or parametric progression, lacking explicit treatment of spatial information propagation and inter\-region consistency\. Moreover, they are not directly applicable to boundary value problems \(BVPs\) with strong spatial coupling\. To address this issue, we propose a spatially correlated curriculum learning framework for PINNs\. To the best of our knowledge, this is the first work to address PINN training difficulties from the perspective of spatial coupling among subregions\. First, spatial causal weights guide information from near\-boundary regions inward, reducing optimization failures and spurious convergence\. Second, a low\-frequency information bridge enforces pseudo\-label\-based consistency across spatially separated regions, suppressing global low\-frequency drift\. Third, a region\-adaptive reweighting strategy adjusts subregion losses to reduce local residuals and recover high\-frequency details\. Experiments on PDE benchmarks show that, under comparable computational cost, the proposed method alleviates training failures and improves solution accuracy\. The code is available athttps://github\.com/pigofmomo/CurriculumLearningPINN\.
###### keywords:
physics\-informed neural networks \(PINNs\); curriculum learning; partial differential equations \(PDEs\); spatial correlation\.
\\ams
35Q68, 68T07, 65M99, 65N99
## 1Introduction
The evolution of many physical systems in nature is typically described by partial differential equations \(PDEs\)\. However, in practical engineering and scientific computations, PDEs often exhibit characteristics such as strong nonlinearity, multi\-scale phenomena, and complex geometries, making the efficient and accurate solution of these equations one of the central challenges in simulations\. Traditional numerical methods \(such as finite differences, finite elements, and finite volumes\) typically require fine grid discretization and complex preprocessing steps when solving PDEs\[[1](https://arxiv.org/html/2605.15254#bib.bib1)\]\. This becomes especially problematic when dealing with high\-dimensional problems, performing real\-time predictions, or solving inverse problems, where traditional methods often encounter severe computational and storage bottlenecks, leading to inefficiency\.
In recent years, AI\-based PDE solving research has developed rapidly\[[2](https://arxiv.org/html/2605.15254#bib.bib2)\]\. Physics\-Informed Neural Networks \(PINNs\)\[[28](https://arxiv.org/html/2605.15254#bib.bib3)\], as a PDE solving method that integrates deep learning with physical constraints, enable end\-to\-end fitting of the solution function\. This approach avoids complex procedures such as explicit grid generation, and has the potential to simultaneously address forward and inverse problems, perform parameter identification, and facilitate scientific discovery\[[37](https://arxiv.org/html/2605.15254#bib.bib4)\]\. As a result, PINNs have gained widespread attention and application in engineering fields such as fluid mechanics, heat transfer, solid mechanics, and electromagnetics\[[13](https://arxiv.org/html/2605.15254#bib.bib5)\]\.
The basic idea of PINNs is to use neural networks as a parametric representation of continuous functions\. Specifically, PINNs typically use architectures such as multi\-layer perceptrons \(MLP\) as the backbone network, mapping spatial\-temporal coordinates and physical parameters to state variables\. During the training phase, the PDE residual term is explicitly incorporated into the loss function, and the solution’s partial derivatives with respect to the inputs are computed using automatic differentiation, thus transforming the constraints imposed by the differential operators into an optimizable cost function\. Subsequently, first\- or second\-order optimizers \(such as Adam, L\-BFGS, etc\.\) are employed to perform gradient descent on the network weights, enabling the network output to simultaneously satisfy the equation constraints and initial/boundary conditions\.
Despite the simplicity of this framework, its training process often faces significant difficulties\[[20](https://arxiv.org/html/2605.15254#bib.bib7),[29](https://arxiv.org/html/2605.15254#bib.bib6),[33](https://arxiv.org/html/2605.15254#bib.bib8)\]: in the absence of sufficient supervised data, the PDE operator tends to dominate the optimization direction of the solution within the domain\. The PDE operator terms often exhibit ”stiffness” characteristics, such as strong nonlinearity, multi\-scale variations, and potential instability, leading to slow loss function convergence\. At the same time, conflicts can arise between the PDE residual, boundary/initial value constraints, and possibly data terms, making PINN training inherently a multi\-objective optimization problem, further exacerbating optimization difficulties\. Additionally, the overall loss landscape is highly non\-convex, causing the model to easily get trapped in poor local minima or plateau regions, resulting in a long\-term failure of error convergence\.
To address the aforementioned training failure issues, various improvement strategies have been proposed in recent years, with the core goal of reducing optimization difficulties and improving the efficiency and accuracy of PINN solutions\. One such strategy is curriculum learning\[[34](https://arxiv.org/html/2605.15254#bib.bib9)\], which first trains the model on relatively simple tasks, such as smooth solutions, weak nonlinearity, or localized regions, and then gradually increases the difficulty by introducing high\-frequency components, stronger advection/reaction effects, more complex boundary conditions, or denser sampling\. This progressive training process provides a clearer optimization path and reduces the likelihood of convergence to poor local minima\. Another important approach is causality\-guided training\[[32](https://arxiv.org/html/2605.15254#bib.bib10)\]: for PDEs with explicit temporal causality \(such as transient problems where information propagates over time\), sequential constraints or staged progression are applied in the time dimension, making the model prioritize fitting the solution at earlier time steps and gradually extend to later time steps, avoiding ”incorrect propagation” caused by violating causality\.
In addition, existing studies have improved PINNs from multiple perspectives\. Weight adjustment dynamically balances different loss terms to mitigate multi\-objective conflicts\[[33](https://arxiv.org/html/2605.15254#bib.bib8)\], adaptive sampling refines collocation points in high\-residual regions to better capture local details\[[8](https://arxiv.org/html/2605.15254#bib.bib11)\], and improved optimizers and training strategies enhance optimization stability and convergence\[[22](https://arxiv.org/html/2605.15254#bib.bib12)\]\. Structural priors further strengthen the representation of multi\-scale features\[[18](https://arxiv.org/html/2605.15254#bib.bib13)\]\. Moreover, domain decomposition splits a complex PDE into local subproblems for collaborative learning\[[10](https://arxiv.org/html/2605.15254#bib.bib14)\]; related efforts also combine temporal decomposition with numerical solvers to accelerate subdomain convergence\[[24](https://arxiv.org/html/2605.15254#bib.bib15)\], incorporate finite\-volume\-style local conservation residuals to improve fine\-scale accuracy\[[36](https://arxiv.org/html/2605.15254#bib.bib16)\], and introduce extra correlation or penalty terms to better coordinate competing objectives\[[5](https://arxiv.org/html/2605.15254#bib.bib17)\]\.
However, existing PINN improvement strategies, including curriculum learning and causality\-guided training, still suffer from notable limitations\. First, time\-causality\-guided methods rely on a unidirectional propagation structure along the temporal dimension\. For many coupled PDE boundary value problems \(BVPs\), however, no such one\-way causality exists; instead, the solution is primarily governed by spatial coupling and boundary constraints\. As a result, time\-stepping\-based causal training paradigms are not directly applicable\. Second, existing curriculum learning methods usually organize training according to task difficulty, such as from low to high frequency, from coarse to fine, or from local to global\. Yet they generally do not explicitly account for spatial heterogeneity of the solution, inter\-region dependency, or information propagation pathways across the domain\. Consequently, under strongly coupled spatial settings, they may still yield locally converged solutions that remain globally inconsistent\.
To address these limitations, this paper proposes a spatially correlated curriculum framework for PINNs, which guides the training process by explicitly considering spatial information propagation and inter\-region consistency\. Specifically:
- •We propose a spatially correlated curriculum learning framework that uses spatial causal weights to guide solution information from near\-boundary regions toward the interior, promoting effective spatial information propagation while suppressing erroneous propagation\.
- •We design an information bridge mechanism to enhance communication between mutually coupled or long\-range correlated regions, encouraging consistent solution patterns in intermediate areas and reducing cross\-region low\-frequency drift\.
- •We introduce a region\-adaptive reweighting strategy based on regional PDE residuals and gradient contributions, which reduces local residuals and improves the recovery of high\-frequency details\.
Extensive experiments demonstrate that the proposed framework improves both global consistency and local detail recovery, providing an effective new perspective for training PINNs on strongly coupled spatial PDE problems\.
## 2Related Works
### 2\.1Information Propagation Hypothesis
Beyond non\-convex loss landscapes and loss\-term imbalance, a growing line of work interprets PINN training through the*information propagation hypothesis*: correct solutions must effectively transmit constraint information from initial/boundary conditions into the interior to form a globally consistent field, yet this transmission can break down even when the overall loss decreases\. Penwarden et al\. systematically analyze this phenomenon and categorize propagation failure into three representative modes:*zero solution*, where large interior regions collapse to near\-trivial states and boundary information is not injected;*no propagation*, where inadequate constraints or poor sampling prevent boundary information from reaching the interior; and*incorrect propagation*, where local structures, strong coupling, or multi\-scale stiffness block transmission and induce wrong solution forms in subregions, ultimately destroying global consistency\.\[[27](https://arxiv.org/html/2605.15254#bib.bib18)\]Building on this view, Daw et al\. show that propagation failure may manifest as interior residual barriers and can be diagnosed via highly imbalanced residual distributions \(e\.g\., skewness and kurtosis\), and they propose a 3R \(retain–resample–release\) strategy to mitigate it\.\[[8](https://arxiv.org/html/2605.15254#bib.bib11)\]Relatedly, Wang et al\. frame time\-domain extrapolation as sequential information propagation and introduce a gated parameter\-space correction term to extend a pretrained solution from a short interval to a larger one\.\[[35](https://arxiv.org/html/2605.15254#bib.bib19)\]
### 2\.2Curriculum Learning
Curriculum learning trains PINNs from easy to hard, letting the network fit simpler structures first and then progressively handle more complex ones, which stabilizes early optimization and reduces poor local minima\. Some studies analyze failure mechanisms and show that PINNs are vulnerable to blocked information propagation and multi\-objective conflicts, motivating curriculum or progressive training to improve convergence stability\[[20](https://arxiv.org/html/2605.15254#bib.bib7)\]\. In terms of concrete designs, curricula have been constructed from a spatial\-subdomain perspective by partitioning the domain into subregions, grouping PDE losses, and adaptively reweighting them based on residual magnitudes to ease simultaneous global optimization\[[3](https://arxiv.org/html/2605.15254#bib.bib20)\]\. Other works adopt a constraint\-first and refinement\-later strategy: the network is initialized with partial boundary conditions or easier PDE constraints, and then stricter constraints are gradually introduced to refine the solution\[[19](https://arxiv.org/html/2605.15254#bib.bib21)\]\. Curricula can also be designed via data or sampling resolution \(e\.g\., switching among multi\-density point sets to learn low\- and high\-frequency components\)\[[31](https://arxiv.org/html/2605.15254#bib.bib22)\], or along the parameter dimension by progressively increasing parameter complexity with adaptive step control\[[11](https://arxiv.org/html/2605.15254#bib.bib23)\]\. Finally, dynamic sampling strategies gradually expand the sampling distribution to improve global coverage and promote information propagation\[[26](https://arxiv.org/html/2605.15254#bib.bib24)\]\. A systematic comparison and analysis of curriculum learning, causality\-guided training, and Fourier\-feature\-based methods has been provided in the literature\[[25](https://arxiv.org/html/2605.15254#bib.bib25)\]\.
### 2\.3Causality\-guided Training
For transient PDEs with explicit time\-causality structures, causality\-guided methods use ”early\-to\-late” stepwise training to transform the global problem into a series of subproblems on time slices/windows, thus avoiding erroneous propagation due to causality violations\. PT\-PINN performs time\-local pretraining on a short window and then extends to the full horizon by parameter continuation with pseudo\-supervision\[[16](https://arxiv.org/html/2605.15254#bib.bib31)\]\. TsoNN mitigates ill\-conditioning via time stepping over successive time intervals\[[6](https://arxiv.org/html/2605.15254#bib.bib32)\]\. Mattey*et al\.*progressively enlarge the time window, generate pseudo\-labels on previously trained regions, and use transfer learning to adapt the model to newly added intervals\[[23](https://arxiv.org/html/2605.15254#bib.bib26)\]\. Penwarden*et al\.*provide a unified view of causality\-guided paradigms and propose an integrated framework to improve stability and applicability\[[27](https://arxiv.org/html/2605.15254#bib.bib18)\]\. Related strategies further organize training by categorizing subregions as trained/being\-trained/to\-be\-trained based on PDE loss and weighting profiles\[[21](https://arxiv.org/html/2605.15254#bib.bib27)\]\. For long\-time integration, dual\-network or implicit\-propagation ideas and sliding\-window continuation have been explored to enhance stability\[[15](https://arxiv.org/html/2605.15254#bib.bib28),[17](https://arxiv.org/html/2605.15254#bib.bib29)\]\. In stiff or singularly perturbed settings, improving the accuracy of initial constraints and applying time\-dependent reweighting \(e\.g\., negative\-exponential weights driven by accumulated error\) can help suppress error growth\[[4](https://arxiv.org/html/2605.15254#bib.bib30),[32](https://arxiv.org/html/2605.15254#bib.bib10)\]\.
## 3Method
In this section, we present the overall framework of the proposed method\. As illustrated in Fig\.[1](https://arxiv.org/html/2605.15254#S3.F1), our approach extends the classical PINN paradigm by introducing a spatially correlated curriculum strategy for training over the computational domain\. The whole framework is organized into a preparation step and two successive stages: distance\-based domain layering, spatially correlated curriculum learning with an information bridge, and region\-adaptive local reweighting\. These components work together to guide the network from learning a globally consistent low\-frequency structure to progressively recovering accurate local high\-frequency details\.
Figure 1:Overview of the proposed method\.The upper\-left panel shows the classical PINN model\. The proposed curriculum learning framework consists of three main parts\. As a preparation step, the computational domain is partitioned into multiple subregions and further organized into distance\-based layers from the boundary toward the interior\. In the first phase, a spatially correlated curriculum is implemented by assigning causal weights to the PDE losses of different layers\. Meanwhile, an information bridge based on a low\-order fitting function is introduced to capture the global low\-frequency component\. In the second phase, a region\-adaptive reweighting strategy is introduced to dynamically rebalance the PDE loss weights across subregions, so that insufficiently optimized regions receive appropriate training emphasis and local accuracy is further improved\. Through these stages, the network progressively recovers the solution from a globally consistent low\-frequency structure to accurate local high\-frequency details\.### 3\.1Spatial curriculum learning
For boundary value problems \(BVPs\), the solution is determined by appropriate and sufficient boundary conditions rather than by a strict temporal causality structure\. For more details, one may refer to Appendix A, which discusses the mathematical classification of PDEs, their analysis from the perspective of physical propagation, and the characteristics of BVPs\.
In the absence of additional supervised data, PINNs can only determine the global solution by combining boundary conditions and PDE constraints\. However, in many strongly coupled or nonlinear, multi\-scale PDEs, boundary information is not automatically and consistently propagated into the domain during training: on one hand, a single boundary is often insufficient to uniquely determine the solution near it; On the other hand, the global solution structure is governed by the coupling between boundary conditions and PDE operators\. As discussed in the related work, such coupling may make simultaneous optimization over the entire domain prone to poor local minima and ineffective information propagation\.
To address this, we propose a spatial weak\-causality constraint curriculum training strategy: by using soft constraints to reinforce the PDE residual constraints near the boundaries, we guide the network’s training focus to gradually move from the boundaries inward, thereby reducing the overall optimization difficulty\.
#### Layered Progression from Boundaries Inward\.
We divide the solution domainΩ\\Omegainto multiple discrete subregions and organize them into several layers according to their distances to the boundary, as shown in Fig\.[2](https://arxiv.org/html/2605.15254#S3.F2), which illustrates the layered structures for both 1D and 2D solution domains\. Intuitively, the network should first capture the solution near the boundary, and then the training focus gradually extends to deeper regions\.
Unlike time\-causality\-guided methods, this approach does not require strict unidirectional causality propagation in space\. Instead, we impose a weak spatial causality bias: learning begins at the outer layers and progressively moves inward\. If the outer layers are not learned correctly, the inner layers are also likely to be inaccurate\.
Let∂Ω\\partial\\Omegadenote the boundary of the domain\. We first partitionΩ\\OmegaintoNNdiscrete subregions, denoted by\{Ωi\}i=1N\\\{\\Omega\_\{i\}\\\}\_\{i=1\}^\{N\}\. For each subregionΩi\\Omega\_\{i\}, we define its distance to the boundary as
di=dist\(Ωi,∂Ω\),d\_\{i\}=\\mathrm\{dist\}\(\\Omega\_\{i\},\\partial\\Omega\),\(1\)where the distance can be evaluated, for example, by the minimum distance betweenΩi\\Omega\_\{i\}and∂Ω\\partial\\Omega\. Based on these distances, the subregions are grouped intoLLlayers from the outer region to the inner region\.
In the 1D case, the interval is partitioned into several discrete subregions and grouped into three layers from the boundary to the center, while in the 2D case, the square domain is partitioned into multiple discrete subregions and organized into concentric layers according to their distances to the boundary\.
\(a\)1D domain
\(b\)2D domain
Figure 2:Boundary distance based layering in 1D and 2D domains\.Here,dddenotes the discrete distance to the boundary, and both examples can be partitioned into three layers withd=0,1,2d=0,1,2\.
#### Cumulative negative exponential weighting
Inspired by existing causality\-guided PINN methods for time\-dependent problems, directional solution fitting can be encouraged by introducing negatively exponentially decayed weights\[[32](https://arxiv.org/html/2605.15254#bib.bib10)\]\.
Specifically, the time domain is divided into multiple intervals, and the cumulative PDE loss is evaluated sequentially to adaptively adjust the causal weights distribution\. As a result, later intervals are emphasized only after earlier ones have been sufficiently optimized, which helps prevent premature convergence in later intervals from interfering with the training of earlier ones\.
Based on this, we divide the domainΩ\\OmegaintoLLlayers from the outer region to the inner region, and letΛj\\Lambda\_\{j\}denote the set of sampling points in thejj\-th layer\. The PDE loss associated with thejj\-th layer is defined as
ℒPDE\(j\)=1Nj∑𝐱∈Λj\|𝒩\(uθ\(𝐱\)\)\|2,\\mathcal\{L\}^\{\(j\)\}\_\{\\mathrm\{PDE\}\}=\\frac\{1\}\{N\_\{j\}\}\\sum\_\{\\mathbf\{x\}\\in\\Lambda\_\{j\}\}\\left\|\\mathcal\{N\}\(u\_\{\\theta\}\(\\mathbf\{x\}\)\)\\right\|^\{2\},\(2\)whereNj=\|Λj\|N\_\{j\}=\|\\Lambda\_\{j\}\|is the number of sampled points in thejj\-th layer,uθu\_\{\\theta\}is the PINN approximation, and𝒩\(⋅\)\\mathcal\{N\}\(\\cdot\)denotes the PDE differential operator, whose derivatives are computed via automatic differentiation\.
To reflect the boundary\-to\-interior curriculum progression, we assign a dynamic weightwiw\_\{i\}to the PDE loss of theii\-th layer, such that the outer layers are emphasized at the early stage of training\. A simple and effective choice is an exponentially decayed weight based on the accumulated residuals of the preceding outer layers:
wi=exp\(−ϵ∑j=0i−1ℒPDE\(j\)\),w\_\{i\}=\\exp\\left\(\-\\epsilon\\sum\_\{j=0\}^\{i\-1\}\\mathcal\{L\}^\{\(j\)\}\_\{\\mathrm\{PDE\}\}\\right\),\(3\)whereϵ\>0\\epsilon\>0is the decay parameter\. This design has a clear gating effect: when the residuals in the outer layers remain large,∑j=0i−1ℒPDE\(j\)\\sum\_\{j=0\}^\{i\-1\}\\mathcal\{L\}^\{\(j\)\}\_\{\\mathrm\{PDE\}\}is also large, which significantly suppresses the weights of theii\-th layer\. Only after the outer layers gradually converge do the weights of the inner layers increase, allowing the training focus to progressively shift toward the interior regions\. In this way, the proposed mechanism numerically realizes a weak causality process in which the outer layers are fitted first and the inner layers are learned progressively afterward\. Figure[3](https://arxiv.org/html/2605.15254#S3.F3)illustrates the distributions of the causal weights and PDE losses for the 1D and 2D cases\.
\(a\)1D domain
\(b\)2D domain
Figure 3:Causal\-weight and subregional PDE\-loss distributions in the 1D and 2D cases\.The causal weights decrease from the boundary toward the interior and are adaptively adjusted according to the PDE\-loss distribution across subregions\.The final weighted PDE loss is written as
ℒPDE=∑i=0L−1wiℒPDE\(i\),\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}=\\sum\_\{i=0\}^\{L\-1\}w\_\{i\}\\mathcal\{L\}^\{\(i\)\}\_\{\\mathrm\{PDE\}\},\(4\)and it is combined with the boundary condition lossℒBC\\mathcal\{L\}\_\{\\mathrm\{BC\}\}\(and possibly data lossℒdata\\mathcal\{L\}\_\{\\mathrm\{data\}\}\) to form the total objective:
ℒ=λBCℒBC\+λdataℒdata\+ℒPDE\.\\mathcal\{L\}=\\lambda\_\{\\mathrm\{BC\}\}\\mathcal\{L\}\_\{\\mathrm\{BC\}\}\+\\lambda\_\{\\mathrm\{data\}\}\\mathcal\{L\}\_\{\\mathrm\{data\}\}\+\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}\.\(5\)
### 3\.2Information Bridge
In boundary value problems \(BVPs\), the global solution structure is usually shaped by the joint effect of multiple boundary conditions rather than by any single boundary alone\. As a result, accurately fitting local boundary regions does not necessarily ensure global consistency\. The solution patterns induced by different boundaries must be coordinated within the interior domain through the governing PDE\.
However, during PINN training, inaccurate approximations in the intermediate regions may weaken the effective coupling of boundary\-induced solution structures, particularly in regions with high\-frequency oscillations or persistent local residuals\. This can create an information barrier that prevents boundary constraints from being consistently reconciled across the domain\. Consequently, the learned solution may suffer from inconsistent cross\-region fitting and pronounced low\-frequency drift, including shifts in the overall trend, baseline errors, and phase or amplitude deviations, even when local residuals are reduced\.
To mitigate this issue, we design a low\-frequency information bridge mechanism: by using a few sparse anchor points to extract reliable low\-frequency components from the current network prediction, we impose a consistency constraint between distant boundaries through a ”soft” connection, thus stabilizing the global structure early in training and reducing the risk of low\-frequency drift, as shown in Fig\.[4](https://arxiv.org/html/2605.15254#S3.F4)\(a\)\.
#### Low\-order fitting with anchors\.
Let the current PINN prediction beuθ\(𝐱\)u\_\{\\theta\}\(\\mathbf\{x\}\)\. We select a set of anchor points𝒜=\{𝐱i\}i=1N\\mathcal\{A\}=\\\{\\mathbf\{x\}\_\{i\}\\\}\_\{i=1\}^\{N\}, which are typically placed in the intermediate region or other key locations that connect different subregions, and perform a weighted fitting of the predicted values at these anchor points using low\-order functions to extract the low\-frequency component\. For example, in the 1D case with a second\-order polynomial basis, we define
ϕ\(x\)=\[1,x,x2\]⊤,u^LF\(x;β\)=ϕ\(x\)⊤β\.\\phi\(x\)=\[1,\\ x,\\ x^\{2\}\]^\{\\top\},\\qquad\\hat\{u\}\_\{\\mathrm\{LF\}\}\(x;\\beta\)=\\phi\(x\)^\{\\top\}\\beta\.\(6\)The corresponding weighted least\-squares estimate is
β∗=argminβ∑i=1Nαi\(ϕ\(xi\)⊤β−u^i\)2,\\beta^\{\\ast\}=\\arg\\min\_\{\\beta\}\\sum\_\{i=1\}^\{N\}\\alpha\_\{i\}\\left\(\\phi\(x\_\{i\}\)^\{\\top\}\\beta\-\\hat\{u\}\_\{i\}\\right\)^\{2\},\(7\)whereu^i=uθ\(xi\)\\hat\{u\}\_\{i\}=u\_\{\\theta\}\(x\_\{i\}\)denotes the network prediction at the anchor pointxix\_\{i\}, andαi\\alpha\_\{i\}is the weight associated with that anchor point\. For the 2D case, higher\-order polynomials or other low\-frequency bases, such as cubic polynomials or radial basis functions, can be adopted in the same manner by extendingϕ\(⋅\)\\phi\(\\cdot\)to two\-dimensional basis functions\.
This procedure extracts a globally smooth trend from the current network prediction and uses it as a shared low\-frequency prior across different regions, as illustrated in Fig\.[4](https://arxiv.org/html/2605.15254#S3.F4)\(b\)\. Intuitively, when high\-frequency errors remain significant in the intermediate regions, directly enforcing cross\-region consistency through the PDE residual can be ineffective\. In contrast, low\-frequency trends are more stable and easier to learn, and thus provide a suitable consistency bridge between boundary\-induced solution structures\.
\(a\)Low frequency drift
\(b\)Low order fitting
Figure 4:Low frequency drift and weighted low order fitting\.\(a\) shows the 1D ODE example used in the later experiments\. In this case, the training process is dominated by the PDE loss, so the high\-frequency structure of the solution is captured reasonably well, while a significant low\-frequency drift still remains, as the solutions near the two boundaries fail to converge consistently to a globally compatible solution\. \(b\) illustrates the weighted low\-order fitting performed on a set of selected anchor points, yielding a function that represents the low\-frequency component of the solution\.
#### Information bridge regularization using pseudo labels\.
After obtaining the low\-frequency fitting functionu^LF\\hat\{u\}\_\{\\mathrm\{LF\}\}, we introduce a low\-frequency consistency loss over the bridge anchors as a regularization term:
ℒLF=𝔼𝐱∼𝒜\[uθ\(𝐱\)−u^LF\(𝐱\)\]2,\\mathcal\{L\}\_\{\\mathrm\{LF\}\}=\\mathbb\{E\}\_\{\\mathbf\{x\}\\sim\\mathcal\{A\}\}\\left\[u\_\{\\theta\}\(\\mathbf\{x\}\)\-\\hat\{u\}\_\{\\mathrm\{LF\}\}\(\\mathbf\{x\}\)\\right\]^\{2\},\(8\)whereu^LF\(𝐱\)\\hat\{u\}\_\{\\mathrm\{LF\}\}\(\\mathbf\{x\}\)serves as a pseudo\-label generated by the low\-frequency fitting, and𝒜\\mathcal\{A\}denotes the anchor set introduced above\. The total loss is then written as
ℒtotal=ℒPDE\+ℒBC\+λLFℒLF,\\mathcal\{L\}\_\{\\mathrm\{total\}\}=\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}\+\\mathcal\{L\}\_\{\\mathrm\{BC\}\}\+\\lambda\_\{\\mathrm\{LF\}\}\\mathcal\{L\}\_\{\\mathrm\{LF\}\},\(9\)whereλLF\\lambda\_\{\\mathrm\{LF\}\}controls the strength of the information bridge\. The loss term in Eq\. \([8](https://arxiv.org/html/2605.15254#S3.E8)\) acts as a soft regularization constraint\. It does not replace the PDE and boundary constraints, but instead provides stable low\-frequency guidance during training, encouraging the network to establish a globally consistent baseline and a coherent slowly varying structure across different subregions\.
The information bridge is mainly applied to regions that have not yet been sufficiently optimized by the curriculum training\. Since the layered PDE weights are smaller in these regions, the corresponding pseudo\-label weights are set to be larger\. This complementary weighting scheme allows the low\-frequency prior to provide additional guidance where the PDE residual constraint is still relatively weak\. As training proceeds, the PDE weights of these regions gradually increase, and the role of the bridge is correspondingly reduced, so that the final solution is still primarily governed by the PDE and boundary constraints rather than by the pseudo\-labels\.
The reason for referring to this mechanism as an information bridge is that the extracted low\-frequency trend provides a shared reference across spatially separated regions\. When the PDE residual alone cannot yet establish consistent coupling between distant boundaries, this shared low\-frequency prior helps connect boundary\-induced solution structures through the interior domain\. By constraining the global baseline and slowly varying components of the solution, the bridge reduces low\-frequency drift and improves cross\-region consistency\. It complements the spatial curriculum training in Section 3\.1: the curriculum controls the boundary\-to\-interior training order, whereas the information bridge maintains low\-frequency coherence across regions during early training\.
### 3\.3Region\-adaptive Reweighting
After applying the spatial curriculum learning and the information bridge constraint introduced above, the PINN is generally less prone to training failure and can obtain a reasonably accurate solution\.
However, the optimization states of different subregions may still be highly unbalanced\. Although the first phase can produce a reasonably good global approximation, some regions may remain insufficiently optimized, especially those with strong high\-frequency variations, complex boundary influences, or insufficient information propagation\. These regions may still exhibit large local PDE residuals and noticeable local errors\.
#### Gradient\-aware region\-adaptive residual reweighting\.
The difficulty of optimizing these regions is not determined solely by the magnitude of their residuals\. In some subregions, the PDE loss may remain relatively large, while the corresponding loss term produces only weak gradients with respect to the network parameters during training\. Consequently, these difficult regions may receive insufficient effective updates despite their large residuals\.
To alleviate this issue, we introduce a gradient\-aware region\-adaptive reweighting mechanism that recalibrates the PDE residual weights by jointly considering the magnitude of the regional PDE loss and its corresponding gradient norm\. Similar local reweighting strategies adjust regional weights according to local PDE residuals\[[3](https://arxiv.org/html/2605.15254#bib.bib20)\], but they do not explicitly account for the gradient norm\.
Suppose that the solution domain is partitioned intoNNsubregions\{Ωi\}i=1N\\\{\\Omega\_\{i\}\\\}\_\{i=1\}^\{N\}, and letℒPDE\(i\)\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}^\{\(i\)\}denote the PDE loss evaluated on theii\-th subregion\. For each subregion, we compute the gradient norm of its PDE loss with respect to the network parameters:
Gi=‖∇θℒPDE\(i\)‖2=\(∑ℓ‖∂ℒPDE\(i\)∂θℓ‖22\)1/2,i=1,…,N\.G\_\{i\}=\\left\\\|\\nabla\_\{\\theta\}\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}^\{\(i\)\}\\right\\\|\_\{2\}=\\left\(\\sum\_\{\\ell\}\\left\\\|\\frac\{\\partial\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}^\{\(i\)\}\}\{\\partial\\theta\_\{\\ell\}\}\\right\\\|\_\{2\}^\{2\}\\right\)^\{1/2\},\\qquad i=1,\\dots,N\.\(10\)whereθ=\{θℓ\}ℓ\\theta=\\\{\\theta\_\{\\ell\}\\\}\_\{\\ell\}denotes all trainable parameters of the neural network\. The quantityGiG\_\{i\}measures how strongly the PDE loss in theii\-th subregion contributes to the parameter updates\. Motivated by the neural tangent kernel \(NTK\) perspective\[[33](https://arxiv.org/html/2605.15254#bib.bib8)\], we use this gradient norm as a practical indicator to quantify the influence of each regional residual on the training dynamics\. A smallerGiG\_\{i\}suggests that the corresponding regional residual has a weaker effect on parameter updates, even when its loss magnitude remains large\.
We then define a regional difficulty score by combining the PDE loss magnitude and the gradient norm:
si=log\(ℒPDE\(i\)\+ε\)−λglog\(Gi\+ε\),i=1,…,N,s\_\{i\}=\\log\\left\(\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}^\{\(i\)\}\+\\varepsilon\\right\)\-\\lambda\_\{g\}\\log\\left\(G\_\{i\}\+\\varepsilon\\right\),\\qquad i=1,\\dots,N,\(11\)whereε\>0\\varepsilon\>0is a small constant for numerical stability, andλg≥0\\lambda\_\{g\}\\geq 0controls the relative contribution of the gradient\-norm term\. Whenλg=0\\lambda\_\{g\}=0, the score is determined only by the regional PDE loss, and the reweighting mechanism mainly emphasizes high\-residual regions\. Asλg\\lambda\_\{g\}increases, the score assigns greater importance to regions with weaker gradient contributions\. Therefore, regions with large PDE losses and small gradient norms receive larger difficulty scores, indicating that they are both poorly fitted and insufficiently optimized\.
The regional coefficients are obtained by mapping the difficulty scores into a bounded interval:
ρi=Φ\(si\),Φ:ℝ→\[ρmin,ρmax\],\\rho\_\{i\}=\\Phi\(s\_\{i\}\),\\qquad\\Phi:\\mathbb\{R\}\\rightarrow\[\\rho\_\{\\min\},\\rho\_\{\\max\}\],\(12\)where0<ρmin<ρmax0<\\rho\_\{\\min\}<\\rho\_\{\\max\}are prescribed constants\. The mappingΦ\(⋅\)\\Phi\(\\cdot\)is chosen to be monotone nondecreasing, so that larger difficulty scores lead to larger regional weights\. In practice,Φ\(⋅\)\\Phi\(\\cdot\)can be implemented by min–max normalization followed by linear rescaling:
s~i=si−minjsjmaxjsj−minjsj\+ε,ρi=ρmin\+\(ρmax−ρmin\)s~i\.\\tilde\{s\}\_\{i\}=\\frac\{s\_\{i\}\-\\min\_\{j\}s\_\{j\}\}\{\\max\_\{j\}s\_\{j\}\-\\min\_\{j\}s\_\{j\}\+\\varepsilon\},\\qquad\\rho\_\{i\}=\\rho\_\{\\min\}\+\(\\rho\_\{\\max\}\-\\rho\_\{\\min\}\)\\tilde\{s\}\_\{i\}\.\(13\)This bounded mapping prevents excessive disparities among regional weights and improves the stability of the adaptive reweighting process\. For example, the interval\[ρmin,ρmax\]\[\\rho\_\{\\min\},\\rho\_\{\\max\}\]can be set to\[1,5\]\[1,5\]\.
Based on these regional coefficients, the adaptively weighted PDE loss is defined as
ℒPDEadapt=1N∑i=1NρiℒPDE\(i\)\.\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}^\{\\mathrm\{adapt\}\}=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\rho\_\{i\}\\,\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}^\{\(i\)\}\.\(14\)In this way, the proposed strategy does not merely emphasize regions with large residuals\. Instead, it dynamically rebalances the training process by assigning greater weights to regions that are both difficult to fit and insufficiently influential in the parameter\-update dynamics\. A representative example is shown in Fig\.[5](https://arxiv.org/html/2605.15254#S3.F5), where the PDE loss, gradient norm, and corresponding loss weight of each subregion are visualized\.
Figure 5:Illustration of the region\-adaptive reweighting mechanism\.Adaptive regional weights, PDE losses, and gradient magnitudes across subregions\. The proposed strategy assigns larger weights to regions with large residuals but weak gradient effectiveness\.
## 4Experiment
The experiments were conducted in a Linux environment equipped with an NVIDIA RTX 5090 GPU \(32 GB\) and an Intel Xeon Gold 6348 processor with 128 GB of memory, using Python libraries including PyTorch, NumPy, and DeepXDE\.
In this study, we compared the proposed PINN with spatially correlated curriculum training \(PINN\-C\) against the classical PINN model\. We consider one ODE example as an introductory pedagogical case, together with three representative PDE problems\. All models were trained using the same network architecture, the same number of sampling points, and the same optimizer settings \(the detailed hyperparameters are provided in Appendix C\)\. In the first phase, causal weights and a low\-frequency information bridge were introduced to facilitate information propagation across the domain\. In the second phase, region\-adaptive reweighting was further incorporated to improve the accuracy of the final solution\. Here, PINN\-C1 denotes the Phase\-I\-only variants, with PINN\-C1\(with bridge\) and PINN\-C1\(w/o bridge\) indicating whether the information bridge is used\. PINN\-C2 denotes the full method using both Phase I and Phase II\.
Each experiment is repeated three times with different random seeds\. The models are optimized sequentially using Adam and LBFGS\. Intermediate results are recorded at fixed intervals, based on which the training weights are adjusted and the low\-frequency fitting function is updated\. The collocation points are uniformly distributed over the domain\. To evaluate the model performance, we analyzed both the training loss curves and the test error curves, using metrics including the mean squared error \(MSE\),L2L\_\{2\}relative error, maximum absolute error, and average PDE residual\. In addition, we visualized the absolute error and its spatial distribution over the domain\. The test data used for validation were either generated from analytical solutions or obtained from COMSOL solvers\. Four types of PDE problems were considered in the experiments, and their detailed mathematical formulations are provided in Appendix B\. The code has been publicly released athttps://github\.com/pigofmomo/CurriculumLearningPINN\.
### 4\.11D ODE
This problem serves as an introductory pedagogical example, from which the proposed spatial curriculum learning strategy was first motivated, analyzed, and validated\. It considers a one\-dimensional ODE with Dirichlet boundary conditions imposed at both endpoints\. The differential operator contains both an exponentially growing term and an oscillatory term\. Through a careful construction, the problem admits an analytical solution\. We consider two variants, corresponding to low\-frequency and high\-frequency settings\.
For the curriculum design, the domain is partitioned into five subregions, which are further organized into three layers from the boundary toward the interior\. Due to the limitations of network capacity and sampling density, we do not further apply region\-adaptive reweighting to improve local accuracy in this example\. Under the current configuration, the obtained accuracy is already satisfactory and is close to the performance limit of the model setup\. The quantitative results over multiple runs are reported in Table[1](https://arxiv.org/html/2605.15254#S4.T1)\.
Table 1:Quantitative results for 1D ODE problems under low\- and high\-frequency settings\.FrequencyMethodPDE ResidualL2L\_\{2\}Relative ErrorMax Abs ErrorMSELowPINN \(default BC weight\)\(9\.22±5\.83\)×100\(9\.22\\pm 5\.83\)\\times 10^\{0\}\(1\.52±0\.09\)×100\(1\.52\\pm 0\.09\)\\times 10^\{0\}\(1\.06±0\.06\)×100\(1\.06\\pm 0\.06\)\\times 10^\{0\}\(3\.76±0\.44\)×10−1\(3\.76\\pm 0\.44\)\\times 10^\{\-1\}PINN \(strong BC weight\)\(2\.27±1\.57\)×102\(2\.27\\pm 1\.57\)\\times 10^\{2\}\(6\.85±4\.45\)×10−1\(6\.85\\pm 4\.45\)\\times 10^\{\-1\}\(5\.77±3\.74\)×10−1\(5\.77\\pm 3\.74\)\\times 10^\{\-1\}\(1\.08±0\.76\)×10−1\(1\.08\\pm 0\.76\)\\times 10^\{\-1\}PINN\-C1 \(w/o bridge\)\(7\.05±3\.22\)×𝟏𝟎𝟎\\mathbf\{\(7\.05\\pm 3\.22\)\\times 10^\{0\}\}\(5\.70±0\.58\)×𝟏𝟎−𝟐\\mathbf\{\(5\.70\\pm 0\.58\)\\times 10^\{\-2\}\}\(5\.16±0\.62\)×𝟏𝟎−𝟐\\mathbf\{\(5\.16\\pm 0\.62\)\\times 10^\{\-2\}\}\(5\.34±0\.01\)×𝟏𝟎−𝟒\\mathbf\{\(5\.34\\pm 0\.01\)\\times 10^\{\-4\}\}HighPINN \(strong BC weight\)\(1\.60±1\.13\)×103\(1\.60\\pm 1\.13\)\\times 10^\{3\}\(6\.78±4\.56\)×10−1\(6\.78\\pm 4\.56\)\\times 10^\{\-1\}\(1\.21±0\.82\)×100\(1\.21\\pm 0\.82\)\\times 10^\{0\}\(4\.08±2\.88\)×10−1\(4\.08\\pm 2\.88\)\\times 10^\{\-1\}PINN\-C1 \(w/o bridge\)\(6\.45±6\.70\)×101\(6\.45\\pm 6\.70\)\\times 10^\{1\}\(1\.47±1\.01\)×100\(1\.47\\pm 1\.01\)\\times 10^\{0\}\(1\.82±1\.31\)×100\(1\.82\\pm 1\.31\)\\times 10^\{0\}\(1\.94±2\.06\)×100\(1\.94\\pm 2\.06\)\\times 10^\{0\}PINN\-C1 \(with bridge\)\(1\.75±0\.45\)×𝟏𝟎𝟎\\mathbf\{\(1\.75\\pm 0\.45\)\\times 10^\{0\}\}\(3\.50±0\.08\)×𝟏𝟎−𝟐\\mathbf\{\(3\.50\\pm 0\.08\)\\times 10^\{\-2\}\}\(7\.07±0\.14\)×𝟏𝟎−𝟐\\mathbf\{\(7\.07\\pm 0\.14\)\\times 10^\{\-2\}\}\(7\.49±0\.03\)×𝟏𝟎−𝟒\\mathbf\{\(7\.49\\pm 0\.03\)\\times 10^\{\-4\}\}
#### Low frequency case\.
We first consider the classical PINN with the default PDE loss weight set to 1, and the result is shown in Fig\.[6](https://arxiv.org/html/2605.15254#S4.F6)\(a\)\. Although the high\-frequency structure in the middle region is partially fitted, the overall solution is incorrect because the boundary conditions at both endpoints are not satisfied\. This failure can be attributed to the dominance of the PDE loss during training, while the BC constraint receives insufficient emphasis because of its relatively small weight\. As shown by the loss curves in Fig\.[6](https://arxiv.org/html/2605.15254#S4.F6)\(b\), although the PDE loss decreases throughout training, the BC loss fails to converge\. This observation further highlights that satisfying the boundary conditions is essential for recovering the correct solution\.






Figure 6:Results for the low\-frequency 1D ODE case\.\(a\)\(b\) Classical PINN with the default loss weights, where the PDE loss dominates and the boundary conditions are not satisfied\. \(c\)\(d\) PINN with a strong BC weight, which collapses to a trivial zero solution due to failed information propagation\. \(e\)\(f\) PINN\-C1 with causal weights only, which successfully guides boundary information into the interior and recovers the correct solution\.We then consider a PINN with a strong boundary constraint, where a large BC weight of 100 is applied, and the result is shown in Fig\.[6](https://arxiv.org/html/2605.15254#S4.F6)\(c\)\. However, the model instead converges to a trivial zero solution, indicating a failure of information propagation\. Although the boundary conditions are enforced, the PDE loss remains difficult to optimize because of its stiffness, preventing the boundary information from propagating into the interior, as reflected in Fig\.[6](https://arxiv.org/html/2605.15254#S4.F6)\(d\)\.
Next, we introduce the proposed spatial curriculum learning strategy using only the causal weights, without the information bridge at this stage\. The result, shown in Fig\.[6](https://arxiv.org/html/2605.15254#S4.F6)\(e\), successfully recovers the correct solution\. The loss curves in Fig\.[6](https://arxiv.org/html/2605.15254#S4.F6)\(f\) further show that all terms can gradually converge during training\. This demonstrates that the introduction of causal weights can effectively guide information propagation, reduce the optimization difficulty, and alleviate the conflict between the boundary loss and the PDE loss\. As shown in Table[1](https://arxiv.org/html/2605.15254#S4.T1), the quantitative accuracy of PINN\-C1 \(w/o bridge\) is substantially better than that of the two preceding methods\.
#### High frequency case\.
We increase the exponential growth rate and the sinusoidal oscillation frequency to construct a high\-frequency case for further investigation\. In this setting, the oscillatory behavior becomes more pronounced and the PDE operator exhibits stronger stiffness\.
We first consider the PINN with a strong BC weight of 1000\. As shown in Fig\.[7](https://arxiv.org/html/2605.15254#S4.F7)\(a\), the model again converges to a trivial zero solution, indicating a failure of information propagation\. Although the BC loss is reduced to a very small value, the PDE loss remains difficult to optimize \(see Fig\.[7](https://arxiv.org/html/2605.15254#S4.F7)\(b\)\)\.
Next, we incorporate causal weights into the proposed spatial curriculum learning framework, assigning larger weights to regions near the boundaries and smaller weights to interior regions, as illustrated in Fig\.[8](https://arxiv.org/html/2605.15254#S4.F8)\(a\)\. With this design, the high\-frequency oscillation in the middle region can be partially recovered, as shown in Fig\.[7](https://arxiv.org/html/2605.15254#S4.F7)\(c\)\. However, the solution near the two boundary regions is still not correctly fitted\. Moreover, because the boundary information cannot effectively propagate across the intermediate region to reach a globally consistent solution, a large low\-frequency drift is eventually produced\. The loss curves in Fig\.[7](https://arxiv.org/html/2605.15254#S4.F7)\(d\) show that, although all loss terms can be optimized to some extent, they still cannot decrease to sufficiently low levels, indicating that the training becomes trapped in a poor local minimum\.
Therefore, it is necessary to introduce the information bridge proposed in our method\. Specifically, we use a second\-order polynomial to fit the predicted values at a set of anchor points during training, thereby constructing a low\-frequency surrogate and generating pseudo\-labels \(see Fig\.[8](https://arxiv.org/html/2605.15254#S4.F8)\(b\)\)\. This surrogate is then incorporated as a regularization term to pull back the low\-frequency drift\. The resulting solution, shown in Fig\.[7](https://arxiv.org/html/2605.15254#S4.F7)\(e\), achieves high fitting accuracy\. The loss curves in Fig\.[7](https://arxiv.org/html/2605.15254#S4.F7)\(f\) further confirm that all loss components can be optimized to satisfactory levels\.






Figure 7:Results for the high\-frequency 1D ODE case\.\(a\)\(b\) PINN with a strong BC weight, which reduces the BC loss but collapses to a trivial zero solution due to failed information propagation\. \(c\)\(d\) PINN\-C1 without the information bridge, which recovers the main solution structure but still exhibits low\-frequency drift\. \(e\)\(f\) PINN\-C1 with the information bridge, which suppresses low\-frequency drift and improves global consistency, leading to a more accurate solution\.

Figure 8:Illustration of the training process for the high\-frequency 1D ODE case\.\(a\) Boundary\-to\-interior causal weighting\. \(b\) Sampled anchor points and the fitted second\-order polynomial used in the information bridge\.
### 4\.22D Poisson Equation with High Frequency Centers
This case considers a standard two\-dimensional Poisson equation, which is a representative diffusion\-dominated problem\. Starting from an original globally smooth low\-frequency source term, we further introduce four localized high\-frequency components to increase the complexity of the solution structure\. Each high\-frequency component is confined to a limited spatial region through a Gaussian window function, so that the resulting source term exhibits both global low\-frequency variation and local high\-frequency perturbations\. The solution near these local centers has a relatively large magnitude and varies rapidly, which makes the PDE operator locally stiff in these regions\. Therefore, this case provides a suitable benchmark for testing the proposed method, particularly its ability to handle locally stiff regions while preserving the overall solution structure\.
The computational domain is partitioned into a5×55\\times 5grid of subregions, which are further grouped into three layers from the boundary toward the interior\. The BC weight is set to10,00010\{,\}000\. The quantitative results are reported in Table[2](https://arxiv.org/html/2605.15254#S4.T2)\. The standard PINN exhibits large prediction errors, and its PDE loss remains difficult to optimize, resulting in convergence to an incorrect solution\. In contrast, PINN\-C1 effectively alleviates the training difficulty and captures the main solution structure\. The information bridge further suppresses low\-frequency drift and improves global consistency, leading to a clear improvement in prediction accuracy\. With additional local region reweighting, PINN\-C2 achieves a modest further improvement in convergence accuracy\.
Table 2:Quantitative results for the 2D Poisson Equation with High Frequency Centers\.MethodPDE ResidualL2L\_\{2\}Relative ErrorMax Abs ErrorMSEPINN\(1\.071±0\.0001\)×103\(1\.071\\pm 0\.0001\)\\times 10^\{3\}\(7\.85±0\.01\)×10−1\(7\.85\\pm 0\.01\)\\times 10^\{\-1\}\(4\.89±0\.01\)×100\(4\.89\\pm 0\.01\)\\times 10^\{0\}\(3\.98±0\.01\)×10−1\(3\.98\\pm 0\.01\)\\times 10^\{\-1\}PINN\-C1\(w/o bridge\)\(9\.49±3\.65\)×100\(9\.49\\pm 3\.65\)\\times 10^\{0\}\(1\.40±1\.84\)×10−1\(1\.40\\pm 1\.84\)\\times 10^\{\-1\}\(2\.67±2\.77\)×10−1\(2\.67\\pm 2\.77\)\\times 10^\{\-1\}\(3\.43±4\.84\)×10−2\(3\.43\\pm 4\.84\)\\times 10^\{\-2\}PINN\-C1\(with bridge\)\(7\.70±0\.99\)×100\(7\.70\\pm 0\.99\)\\times 10^\{0\}\(1\.78±0\.87\)×10−2\(1\.78\\pm 0\.87\)\\times 10^\{\-2\}\(5\.41±0\.03\)×10−2\(5\.41\\pm 0\.03\)\\times 10^\{\-2\}\(2\.53±2\.25\)×10−4\(2\.53\\pm 2\.25\)\\times 10^\{\-4\}PINN\-C2\(7\.41±1\.00\)×𝟏𝟎𝟎\\mathbf\{\(7\.41\\pm 1\.00\)\\times 10^\{0\}\}\(1\.05±0\.35\)×𝟏𝟎−𝟐\\mathbf\{\(1\.05\\pm 0\.35\)\\times 10^\{\-2\}\}\(5\.16±0\.27\)×𝟏𝟎−𝟐\\mathbf\{\(5\.16\\pm 0\.27\)\\times 10^\{\-2\}\}\(7\.93±5\.32\)×𝟏𝟎−𝟓\\mathbf\{\(7\.93\\pm 5\.32\)\\times 10^\{\-5\}\}
Figure[9](https://arxiv.org/html/2605.15254#S4.F9)provides a visual comparison of the solution results\. As shown in Fig\.[9](https://arxiv.org/html/2605.15254#S4.F9)\(a\), the classical PINN mainly captures the low\-frequency component of the solution, while the PDE loss associated with the localized high\-frequency structures remains difficult to reduce\. This behavior is also reflected in the loss curves in Fig\.[9](https://arxiv.org/html/2605.15254#S4.F9)\(b\), where the second\-stage L\-BFGS optimization terminates early due to poor convergence\. As a result, the predicted solution fails to recover the localized high\-frequency details\.
In contrast, PINN\-C1 without the information bridge can better fit the high\-frequency structures\. However, it may still suffer from low\-frequency drift, leading to an inaccurate global solution profile, as shown in Fig\.[9](https://arxiv.org/html/2605.15254#S4.F9)\(c\)\(d\)\. After incorporating the information bridge and region\-adaptive reweighting, the proposed PINN\-C2, shown in Fig\.[9](https://arxiv.org/html/2605.15254#S4.F9)\(e\), not only recovers the global solution structure but also preserves the local high\-frequency details more accurately\. Throughout the optimization process, except for the initial warm\-up stage, both the PDE loss and the BC loss decrease steadily to low levels, as shown in Fig\.[9](https://arxiv.org/html/2605.15254#S4.F9)\(f\)\.






Figure 9:Results for the PoissonHF case\.\(a\)\(b\) The classical PINN captures only the low\-frequency component and fails to resolve localized high\-frequency details\. \(c\)\(d\) PINN\-C1\(w/o bridge\) recovers the main solution structure but exhibits low\-frequency drift\. \(e\)\(f\) PINN\-C2 accurately recovers both the global structure and local details, with stable convergence of the PDE and BC losses\.The intermediate training snapshots of the proposed method are shown in Fig\.[10](https://arxiv.org/html/2605.15254#S4.F10)\. In Phase I, the curriculum strategy guides information propagation from the boundary toward the interior\. Accordingly, the outer\-layer regions are assigned larger weights, even though some interior subregions may exhibit relatively large PDE losses, as illustrated in Fig\.[10](https://arxiv.org/html/2605.15254#S4.F10)\(a\)\. Figure[10](https://arxiv.org/html/2605.15254#S4.F10)\(b\) shows the result of the information bridge, where a quadratic two\-dimensional polynomial is fitted to the anchor points\. In Phase II, the local weights are further adjusted according to the PDE loss levels and gradient norms in different regions, allowing the training process to focus more effectively on high\-error regions that receive insufficient gradient updates, as shown in Fig\.[10](https://arxiv.org/html/2605.15254#S4.F10)\(c\)\.



Figure 10:Intermediate training process for the PoissonHF case\.\(a\) Phase\-I causal weighting from the boundary toward the interior\. \(b\) Information bridge constructed from anchor points using a two\-dimensional quadratic polynomial\. \(c\) Phase\-II adaptive local reweighting based on regional PDE losses and gradient norms\.
### 4\.32D Advection\-Diffusion\-Reaction Equation
This case considers a two\-dimensional advection\-diffusion\-reaction \(ADR\) equation\. Compared with purely diffusion\-dominated problems, the ADR system exhibits richer physical behavior and more challenging training dynamics, since the solution is jointly influenced by directional advection, local smoothing, and spatially varying reaction effects\. From the training perspective, the coexistence of advection, diffusion, and reaction makes the PDE residual more difficult to optimize than in simpler elliptic cases\.
In particular, the advection term introduces a preferred transport direction for information propagation, while the diffusion term tends to smooth the solution and the reaction term further modifies the local amplitude and stiffness of the operator\. Therefore, this case serves as a useful benchmark for testing whether the proposed spatial curriculum learning framework can guide information propagation more effectively and improve the fitting of locally difficult regions\.
The computational domain is divided into a5×55\\times 5array of subregions and further organized into three layers from the boundary toward the interior\. The BC weight is fixed at 10,000, and the quantitative results are summarized in Table[3](https://arxiv.org/html/2605.15254#S4.T3)\.
Although this ADR case involves multiple coupled physical processes, the analytical solution is not particularly complex, as some terms are partially canceled by construction\. As a result, this case does not exhibit the severe loss stagnation, training failure, or low\-frequency drift observed in more challenging examples\. Even in this relatively moderate setting, the quantitative results still demonstrate the overall effectiveness of the proposed framework\. Compared with the standard PINN, all proposed variants achieve a clear reduction in the PDE residual\. Among them, PINN\-C1\(with bridge\) obtains the lowest PDE residual, whereas PINN\-C2 achieves the best performance across all prediction error metrics\. These results suggest that the proposed framework can still provide beneficial effects for PDE problems involving multiple interacting physical mechanisms\.
Table 3:Quantitative results for the 2D Advection\-Diffusion\-Reaction Equation\.MethodPDE ResidualL2L\_\{2\}Relative ErrorMax Abs ErrorMSEPINN\(4\.894±0\.483\)×10−1\(4\.894\\pm 0\.483\)\\times 10^\{\-1\}\(7\.82±0\.73\)×10−3\(7\.82\\pm 0\.73\)\\times 10^\{\-3\}\(2\.48±0\.41\)×10−2\(2\.48\\pm 0\.41\)\\times 10^\{\-2\}\(1\.53±0\.29\)×10−5\(1\.53\\pm 0\.29\)\\times 10^\{\-5\}PINN\-C1\(w/o bridge\)\(3\.480±0\.521\)×10−1\(3\.480\\pm 0\.521\)\\times 10^\{\-1\}\(6\.70±1\.70\)×10−3\(6\.70\\pm 1\.70\)\\times 10^\{\-3\}\(2\.22±1\.01\)×10−2\(2\.22\\pm 1\.01\)\\times 10^\{\-2\}\(1\.18±0\.59\)×10−5\(1\.18\\pm 0\.59\)\\times 10^\{\-5\}PINN\-C1\(with bridge\)\(3\.477±0\.835\)×𝟏𝟎−𝟏\\mathbf\{\(3\.477\\pm 0\.835\)\\times 10^\{\-1\}\}\(6\.32±2\.35\)×10−3\(6\.32\\pm 2\.35\)\\times 10^\{\-3\}\(2\.58±1\.47\)×10−2\(2\.58\\pm 1\.47\)\\times 10^\{\-2\}\(1\.12±0\.83\)×10−5\(1\.12\\pm 0\.83\)\\times 10^\{\-5\}PINN\-C2\(3\.495±0\.364\)×10−1\(3\.495\\pm 0\.364\)\\times 10^\{\-1\}\(5\.71±0\.27\)×𝟏𝟎−𝟑\\mathbf\{\(5\.71\\pm 0\.27\)\\times 10^\{\-3\}\}\(1\.56±0\.17\)×𝟏𝟎−𝟐\\mathbf\{\(1\.56\\pm 0\.17\)\\times 10^\{\-2\}\}\(8\.08±0\.75\)×𝟏𝟎−𝟔\\mathbf\{\(8\.08\\pm 0\.75\)\\times 10^\{\-6\}\}
Figure[11](https://arxiv.org/html/2605.15254#S4.F11)shows the visual comparison of the solution results for the 2D advection\-diffusion\-reaction equation\. Unlike the previous cases, this problem does not suffer from severe training failure; the main difference between the methods is instead reflected in their final prediction accuracy\. As shown in Fig\.[11](https://arxiv.org/html/2605.15254#S4.F11)\(a\), the classical PINN already captures the overall solution structure, but its accuracy is still limited, especially in local regions\. For ease of comparison, the quantitative metrics are included directly in the figure caption\. Overall, PINN\-C2 achieves better quantitative performance, including lower MSE, relativeL2L\_\{2\}error, maximum absolute error, and PDE residual\. As shown in Fig\.[11](https://arxiv.org/html/2605.15254#S4.F11)\(c\), PINN\-C2 further improves the solution accuracy and local detail reconstruction\. As shown in Fig\.[11](https://arxiv.org/html/2605.15254#S4.F11)\(d\), the proposed method achieves a lower final loss than the baseline PINN, indicating more effective optimization\.




Figure 11:Results for the 2D advection–diffusion–reaction equation\.\(a\)\(b\) Classical PINN, which shows limited PDE\-loss convergence and relatively large prediction errors:MSE=1\.9192×10−5\\mathrm\{MSE\}=1\.9192\\times 10^\{\-5\},L2L\_\{2\}relative error=8\.8126×10−3=8\.8126\\times 10^\{\-3\}, max absolute error=2\.1332×10−2=2\.1332\\times 10^\{\-2\}, and PDE residual=5\.2266×10−1=5\.2266\\times 10^\{\-1\}\. \(c\)\(d\) PINN\-C2, which improves both solution accuracy and convergence behavior:MSE=7\.0912×10−6\\mathrm\{MSE\}=7\.0912\\times 10^\{\-6\},L2L\_\{2\}relative error=5\.3567×10−3=5\.3567\\times 10^\{\-3\}, max absolute error=1\.3212×10−2=1\.3212\\times 10^\{\-2\}, and PDE residual=2\.9840×10−1=2\.9840\\times 10^\{\-1\}\.Figure[12](https://arxiv.org/html/2605.15254#S4.F12)shows several intermediate snapshots of the proposed training process\. In Phase I, the curriculum strategy progressively propagates information from the boundary toward the interior\. At the stage shown in Fig\.[12](https://arxiv.org/html/2605.15254#S4.F12)\(a\), the outermost layer already has a relatively small PDE loss, allowing the middle layer to receive substantial emphasis with a weight of 0\.97\. By contrast, because some middle\-layer subregions still have relatively large PDE losses, the innermost layer is assigned a smaller weight of 0\.62 and is not yet strongly emphasized\. Figure[12](https://arxiv.org/html/2605.15254#S4.F12)\(b\) illustrates the information bridge constructed by fitting a quadratic two\-dimensional polynomial to the anchor points, which captures the low\-frequency diffusion trend well\. In Phase II, the local weights are further adapted according to the regional PDE losses and gradient norms\. As shown in Fig\.[12](https://arxiv.org/html/2605.15254#S4.F12)\(c\), the lower\-left region exhibits a larger PDE loss but a smaller gradient norm, indicating insufficient effective updates during training\. It is therefore assigned a larger local weight, so that the optimization process can focus more effectively on this difficult region\.



Figure 12:Intermediate training snapshots for the 2D advection–diffusion–reaction equation\.\(a\) Phase\-I curriculum weighting, with larger weights assigned to the outer and middle layers than to the inner layer\. \(b\) Information bridge constructed from anchor points using a two\-dimensional quadratic polynomial\. \(c\) Phase\-II region\-adaptive reweighting, which assigns a larger local weight to the lower\-left region with large PDE loss and weak gradient influence\.
### 4\.4NS Equations for Lid\-Driven Cavity Flow
This case considers the incompressible Navier–Stokes equations for the classical lid\-driven cavity flow, a standard benchmark in computational fluid dynamics\. The flow is defined in a closed square cavity, where the top boundary moves horizontally with a prescribed velocity profile depending on the horizontal coordinate, while the other three walls satisfy no\-slip conditions\. A pressure reference point is specified at the lower\-left corner to fix the pressure\. Driven by the moving lid, the fluid develops a recirculating flow pattern with a primary central vortex and possible secondary vortices near the corners\.
Physically, this problem reflects the interaction between advection and viscous diffusion in a confined domain\. The Reynolds number is set to 100\. The BC loss weights are set to100100\. Therefore, this case provides a suitable benchmark for evaluating whether the proposed framework can capture both the global flow structure and the localized high\-gradient regions in a coupled nonlinear PDE system\.
The quantitative results are reported in Tables[4](https://arxiv.org/html/2605.15254#S4.T4)and[5](https://arxiv.org/html/2605.15254#S4.T5)\. Compared with the standard PINN, PINN\-C1 brings only limited improvement, and the use of causal weights alone even slightly degrades some metrics\. However, PINN\-C1\(with bridge\) performs better than PINN\-C1\(w/o bridge\), showing that the information bridge helps improve solution consistency\.
With the Phase\-II region\-adaptive reweighting, PINN\-C2 achieves the best performance in all metrics except the maximum absolute error of pressure\. The improvement is especially evident for the velocity components, indicating that local adaptive reweighting plays a more important role in this case\. This suggests that, even without severe training stagnation or obvious optimization failure, the proposed framework can still improve the final converged accuracy by enhancing local detail reconstruction and global solution quality\.
Table 4:Quantitative results for the NS equations: PDE residual and pressure\-related metrics\.MethodPDE ResidualppL2L\_\{2\}Relative ErrorppMax Abs ErrorppMSEPINN\(3\.03±0\.24\)×10−2\(3\.03\\pm 0\.24\)\\times 10^\{\-2\}\(9\.83±1\.30\)×10−2\(9\.83\\pm 1\.30\)\\times 10^\{\-2\}\(3\.84±0\.14\)×10−1\(3\.84\\pm 0\.14\)\\times 10^\{\-1\}\(4\.55±1\.14\)×10−4\(4\.55\\pm 1\.14\)\\times 10^\{\-4\}PINN\-C1\(w/o bridge\)\(3\.18±0\.44\)×10−2\(3\.18\\pm 0\.44\)\\times 10^\{\-2\}\(1\.13±0\.25\)×10−1\(1\.13\\pm 0\.25\)\\times 10^\{\-1\}\(3\.81±0\.19\)×𝟏𝟎−𝟏\\mathbf\{\(3\.81\\pm 0\.19\)\\times 10^\{\-1\}\}\(6\.20±2\.55\)×10−4\(6\.20\\pm 2\.55\)\\times 10^\{\-4\}PINN\-C1\(with bridge\)\(3\.03±0\.44\)×10−2\(3\.03\\pm 0\.44\)\\times 10^\{\-2\}\(9\.05±1\.68\)×10−2\(9\.05\\pm 1\.68\)\\times 10^\{\-2\}\(3\.82±0\.20\)×10−1\(3\.82\\pm 0\.20\)\\times 10^\{\-1\}\(3\.93±1\.50\)×10−4\(3\.93\\pm 1\.50\)\\times 10^\{\-4\}PINN\-C2\(2\.12±0\.35\)×𝟏𝟎−𝟐\\mathbf\{\(2\.12\\pm 0\.35\)\\times 10^\{\-2\}\}\(7\.42±0\.71\)×𝟏𝟎−𝟐\\mathbf\{\(7\.42\\pm 0\.71\)\\times 10^\{\-2\}\}\(3\.95±0\.22\)×10−1\(3\.95\\pm 0\.22\)\\times 10^\{\-1\}\(2\.57±0\.50\)×𝟏𝟎−𝟒\\mathbf\{\(2\.57\\pm 0\.50\)\\times 10^\{\-4\}\}
Table 5:Quantitative results for the NS equations: velocity\-related metrics\.MethoduuL2L\_\{2\}Relative ErroruuMax Abs ErroruuMSEvvL2L\_\{2\}Relative ErrorvvMax Abs ErrorvvMSEPINN\(7\.09±1\.41\)×10−2\(7\.09\\pm 1\.41\)\\times 10^\{\-2\}\(8\.17±2\.01\)×10−2\(8\.17\\pm 2\.01\)\\times 10^\{\-2\}\(7\.45±2\.85\)×10−4\(7\.45\\pm 2\.85\)\\times 10^\{\-4\}\(1\.06±0\.15\)×10−1\(1\.06\\pm 0\.15\)\\times 10^\{\-1\}\(1\.02±0\.15\)×10−1\(1\.02\\pm 0\.15\)\\times 10^\{\-1\}\(7\.83±2\.24\)×10−4\(7\.83\\pm 2\.24\)\\times 10^\{\-4\}PINN\-C1\(w/o bridge\)\(9\.02±2\.46\)×10−2\(9\.02\\pm 2\.46\)\\times 10^\{\-2\}\(1\.02±0\.29\)×10−1\(1\.02\\pm 0\.29\)\\times 10^\{\-1\}\(1\.24±0\.62\)×10−3\(1\.24\\pm 0\.62\)\\times 10^\{\-3\}\(1\.33±0\.35\)×10−1\(1\.33\\pm 0\.35\)\\times 10^\{\-1\}\(1\.29±0\.37\)×10−1\(1\.29\\pm 0\.37\)\\times 10^\{\-1\}\(1\.30±0\.62\)×10−3\(1\.30\\pm 0\.62\)\\times 10^\{\-3\}PINN\-C1\(with bridge\)\(6\.44±1\.61\)×10−2\(6\.44\\pm 1\.61\)\\times 10^\{\-2\}\(7\.40±1\.86\)×10−2\(7\.40\\pm 1\.86\)\\times 10^\{\-2\}\(6\.28±3\.21\)×10−4\(6\.28\\pm 3\.21\)\\times 10^\{\-4\}\(9\.66±2\.14\)×10−2\(9\.66\\pm 2\.14\)\\times 10^\{\-2\}\(9\.38±2\.10\)×10−2\(9\.38\\pm 2\.10\)\\times 10^\{\-2\}\(6\.74±3\.07\)×10−4\(6\.74\\pm 3\.07\)\\times 10^\{\-4\}PINN\-C2\(4\.55±0\.47\)×𝟏𝟎−𝟐\\mathbf\{\(4\.55\\pm 0\.47\)\\times 10^\{\-2\}\}\(5\.33±0\.74\)×𝟏𝟎−𝟐\\mathbf\{\(5\.33\\pm 0\.74\)\\times 10^\{\-2\}\}\(2\.98±0\.63\)×𝟏𝟎−𝟒\\mathbf\{\(2\.98\\pm 0\.63\)\\times 10^\{\-4\}\}\(6\.66±0\.65\)×𝟏𝟎−𝟐\\mathbf\{\(6\.66\\pm 0\.65\)\\times 10^\{\-2\}\}\(5\.92±0\.85\)×𝟏𝟎−𝟐\\mathbf\{\(5\.92\\pm 0\.85\)\\times 10^\{\-2\}\}\(3\.08±0\.61\)×𝟏𝟎−𝟒\\mathbf\{\(3\.08\\pm 0\.61\)\\times 10^\{\-4\}\}
As can also be observed from the visualizations in Fig\.[13](https://arxiv.org/html/2605.15254#S4.F13), the prediction results of the baseline PINN and PINN\-C2 are generally close at the global level\. Nevertheless, PINN\-C2 still shows noticeable improvements in both overall accuracy and local detail reconstruction, with the main error metrics reduced by approximately 40%\.



Figure 13:Prediction results for the incompressible Navier–Stokes equations in the lid\-driven cavity flow case\.\(a\) Reference solution\. \(b\) Prediction results and absolute error distribution of the baseline PINN, which gives a PDE residual of3\.1058×10−23\.1058\\times 10^\{\-2\}andL2L\_\{2\}relative errors of1\.0454×10−11\.0454\\times 10^\{\-1\},7\.0820×10−27\.0820\\times 10^\{\-2\}, and1\.0492×10−11\.0492\\times 10^\{\-1\}forpp,uu, andvv, respectively\. \(c\) Prediction results and absolute error distribution of PINN\-C2, which reduces the PDE residual to1\.9113×10−21\.9113\\times 10^\{\-2\}and the correspondingL2L\_\{2\}relative errors to6\.7516×10−26\.7516\\times 10^\{\-2\},4\.2435×10−24\.2435\\times 10^\{\-2\}, and6\.2153×10−26\.2153\\times 10^\{\-2\}\.Fig\.[14](https://arxiv.org/html/2605.15254#S4.F14)illustrates the training process of the proposed curriculum framework\. In Fig\.[14](https://arxiv.org/html/2605.15254#S4.F14)\(a\), larger causal weights are assigned to the upper region and the upper\-right corner, where the PDE loss has not yet fully converged\. Even when the PDE loss in some interior regions is already small, the training is still guided to focus first on the outer layer, so as to establish a more reliable boundary\-induced solution structure\. Fig\.[14](https://arxiv.org/html/2605.15254#S4.F14)\(b\) shows the adaptive local reweighting in the second phase\. Although the upper\-left and upper\-right regions both exhibit relatively large PDE losses and gradient norms, the joint assessment assigns the largest adaptive weight to the region slightly below the upper\-right corner\. This indicates that the proposed strategy emphasizes the region requiring stronger optimization based on both residual magnitude and gradient information\. Fig\.[14](https://arxiv.org/html/2605.15254#S4.F14)\(c\) presents the information bridge fitted by a third\-order polynomial, where the three channels \(ch=0,1,2\) correspond to the low\-frequency structures ofuu,vv, andpp, respectively\. The first channel captures the relatively large horizontal velocity near the upper boundary\. The second channel, limited by the low polynomial order, does not fully reproduce the asymmetric distribution of thevv\-component\. The third channel reflects the low\-frequency trend of the pressure field, with relatively large values near the upper\-right corner\.



Figure 14:Visualization of the proposed training strategy for the incompressible Navier–Stokes equations in the lid\-driven cavity flow case\.\(a\) Phase\-I curriculum weighting, emphasizing the upper region and the upper\-right corner to guide boundary\-to\-interior learning\. \(b\) Phase\-II adaptive local reweighting based on regional PDE losses and gradient norms, assigning the largest weight to the region slightly below the upper\-right corner after their joint assessment\. \(c\) Information bridge fitted from anchor points using a third\-order two\-dimensional polynomial, wherech=0, 1, and 2 correspond to the low\-frequency structures ofuu,vv, andpp, respectively\.
## 5Conclusion
This paper proposes a curriculum learning framework for PINNs from the perspective of spatial correlation, with a particular focus on coupled PDE boundary value problems\. Unlike existing approaches that mainly rely on temporal causality or simultaneous global optimization, the proposed method reformulates the curriculum mechanism of PINNs as a spatially progressive optimization process, consisting of boundary information propagation, cross\-region consistency regularization, and gradual refinement of local details\.
To this end, three key components are introduced\. First, spatial causal weights are introduced to impose a weak\-causality constraint, guiding boundary information from near\-boundary regions toward the interior and thereby reducing the difficulty of nonconvex optimization as well as the conflict between the boundary loss and the PDE residual loss\. Second, an information bridge mechanism is proposed to suppress low\-frequency drift during cross\-region training, thus improving the global consistency and stability of the learned solution\. Third, a region\-adaptive weighting strategy is developed to further enhance the fitting of locally high\-frequency structures and difficult regions while preserving the overall propagation pattern\. Experimental results demonstrate that the proposed framework can effectively improve the training behavior of PINNs across multiple cases, yielding clear gains in training stability, the effectiveness of information propagation, and final prediction accuracy\.
This work also suggests several directions for future research\. Because different PDEs exhibit distinct physical characteristics and optimization challenges, more physically adaptive spatial curriculum strategies are needed for different problem types\. Moreover, information propagation in complex systems is often richer than a simple boundary\-to\-interior process, especially under strong advection, closed\-loop flows, or multi\-path coupling, calling for more general formulations of spatial weak causality\. Extending the framework to irregular geometries, multiple boundaries, more general boundary conditions such as Robin boundary conditions, and insufficient boundary information is also an important direction\. Future work will focus on these aspects to develop a more general and robust spatial curriculum learning approach for PINNs\.
## Appendix A: Analysis of PDE Problem Characteristics
To discuss phenomena such as ”information propagation” and ”global coupling” in PINN training more clearly, this appendix summarizes the characteristic types of common PDEs from both mathematical and physical perspectives, and briefly explains the decisive role of different initial/boundary conditions on the solution\[[12](https://arxiv.org/html/2605.15254#bib.bib33)\]\.
### A\.1 PDE Types
From a mathematical perspective, second\-order linear PDEs are classified based on the principal symbol quadratic form\. The principal part consists of second\-order derivatives, and depending on the definiteness or degeneracy of the corresponding quadratic form, PDEs are classified as elliptic, hyperbolic, or parabolic types\[[7](https://arxiv.org/html/2605.15254#bib.bib34)\]\.\. These classifications influence the solution’s coupling range, propagation mechanism, and characteristic direction\.
- •Elliptic:Elliptic equations lack a temporal causal structure and clear propagation direction\. Their solutions exhibit global spatial coupling, meaning local disturbances can affect the entire domain\. They are common in steady\-state diffusion and potential field problems, reflecting ”global constraints without explicit propagation\.”
- •Hyperbolic:Hyperbolic equations have strong causality, with solutions propagating along characteristic lines or surfaces\. These equations are associated with wave propagation and strong convective transport, where information is transmitted along the characteristic direction, focusing on ”propagation along the characteristics\.”
- •Parabolic:Parabolic equations are ”weakly causal,” with solutions evolving in one direction, typically time, while diffusion smooths and weakens the information\. The heat conduction equation is a typical example, where the solution advances over time and becomes increasingly diffused\.
This classification reveals a key fact: the information propagation structure of different PDEs is not consistent\. Therefore, training strategies centered on ”causal progression” are often more natural for hyperbolic/parabolic equations, but extending this to elliptic equations or strongly coupled BVPs requires more careful design\.
### A\.2 Physical Perspective
From a physical perspective, many engineering PDEs can be represented within the Advection\-Diffusion\-Reaction \(ADR\) framework\[[30](https://arxiv.org/html/2605.15254#bib.bib35)\], where each term corresponds to distinct ”information dynamics,” influencing both the spatial/temporal structure of the solution and the associated learning challenges\.
- •Transient vs\. Steady\-State:Transient problems evolve over time, moving from an initial to a steady state\. Steady\-state problems, however, are not time\-dependent, and their solutions are determined by spatial constraints and source terms\. For PINNs, transient problems naturally introduce time propagation, while steady\-state problems emphasize global consistency constraints\.
- •Advection Term:This term represents the ”transport” effect, transferring information along the flow direction, typically showing clear directionality and asymmetry\. In advection\-dominated cases, such as with high Peclet numbers, the solution exhibits strong directional characteristics and boundary layers, leading to potential issues with ”insufficient propagation along the characteristic direction” during training\.
- •Diffusion Term:This term represents ”mixing” and ”smoothing,” weakening gradients and diffusing high\-frequency structures\. As diffusion increases, the solution smooths out, and local disturbances become less significant\.
- •Reaction Term:Describes the local generation/consumption mechanism, affecting the growth or decay rate of the solution\. When reaction dominates, the solution may localize or show rapid growth/decay, causing scale inconsistencies and leading to ”local rigidity\.”
The three physical mechanisms of ADR are closely related to the mathematical classification of PDEs\. In complex problems, such as the Navier\-Stokes equations\[[9](https://arxiv.org/html/2605.15254#bib.bib36)\], they determine the solution’s multi\-scale nature, peak/boundary layer structures, and information propagation patterns\. These mechanisms also serve as an essential physical foundation for understanding the challenges in PINN training\.
### A\.3 Boundary Conditions and Problem Forms
In addition to the equation type, the form of initial and boundary conditions directly determines the uniqueness of the solution and the mode of information input\[[14](https://arxiv.org/html/2605.15254#bib.bib37)\]:
- •Initial Value Problem \(IVP\):A typical evolution problem, where the solution is typically uniquely determined by the initial conditions, provided the evolution direction aligns with the equation type \(e\.g\., parabolic or hyperbolic equations advancing in time\)\. In this case, information propagates primarily from the initial time onward\.
- •Boundary Value Problem \(BVP\) and Initial\-Boundary Value Problem \(IBVP\):Constraint problems that require sufficient boundary conditions on spatial boundaries \(and possibly initial surfaces\) to ensure the problem is well\-posed\. For elliptic and strongly coupled systems, boundary conditions often define the global solution shape\. In such cases, information does not simply ”propagate in one direction”; instead, it reflects global consistency and coupling across regions\.
## Appendix B: Mathematical Formulation of the PDEs
### B\.1 1D ODE
We consider a second\-order boundary value problem \(BVP\) for a 1D ordinary differential equation \(ODE\) with the following formulations:
d2ydx2=eαx\[g2\(x\)sin\(kx\)\+2kg1\(x\)cos\(kx\)\],x∈\(0,1\),\\frac\{\\mathrm\{d\}^\{2\}y\}\{\\mathrm\{d\}x^\{2\}\}=e^\{\\alpha x\}\\left\[g\_\{2\}\(x\)\\sin\(kx\)\+2kg\_\{1\}\(x\)\\cos\(kx\)\\right\],\\quad x\\in\(0,1\),\(15\)
whereg1\(x\)g\_\{1\}\(x\)andg2\(x\)g\_\{2\}\(x\)are given by:
g1\(x\)=1−x−x2\+α\(x−x2\),g\_\{1\}\(x\)=1\-x\-x^\{2\}\+\\alpha\(x\-x^\{2\}\),\(16\)
g2\(x\)=−2\+2α\(1−2x\)\+\(α2−k2\)\(x−x2\),g\_\{2\}\(x\)=\-2\+2\\alpha\(1\-2x\)\+\(\\alpha^\{2\}\-k^\{2\}\)\(x\-x^\{2\}\),\(17\)
with the boundary conditions:
y\(0\)=0,y\(1\)=0\.y\(0\)=0,\\quad y\(1\)=0\.\(18\)
#### Parameter Settings
To investigate the effect of spatial frequency and stiffness, two configurations are considered:
Low\-frequency case:k=10π,α=2\.0,\\text\{Low\-frequency case:\}\\quad k=10\\pi,\\quad\\alpha=2\.0,
High\-frequency case:k=20π,α=3\.0\.\\text\{High\-frequency case:\}\\quad k=20\\pi,\\quad\\alpha=3\.0\.
#### Exact Solution
The analytical exact solution to the ODE is given by:
yexact\(x\)=x\(1−x\)eαxsin\(kx\)y\_\{exact\}\(x\)=x\(1\-x\)e^\{\\alpha x\}\\sin\(kx\)\(19\)
whereα\\alphaandkkare the parameters defined in the respective low\-frequency and high\-frequency cases\.
## B\.2 2D Poisson Equation with High Frequency Centers
Consider a 2D Poisson equation where the solution consists of a global low\-frequency term and high\-frequency components centered at four specific locations\. The problem is defined as follows:
−Δu\(x,y\)=f\(x,y\),\(x,y\)∈Ω,\-\\Delta u\(x,y\)=f\(x,y\),\\qquad\(x,y\)\\in\\Omega,\(20\)
where the domainΩ\\Omegais:
Ω=\(0,1\)×\(0,1\),\\Omega=\(0,1\)\\times\(0,1\),\(21\)
This problem uses Dirichlet boundary conditions, where the boundary values are given by the analytical solution:
u\(x,y\)=uexact\(x,y\),∀\(x,y\)∈∂Ω\.u\(x,y\)=u\_\{\\text\{exact\}\}\(x,y\),\\quad\\forall\(x,y\)\\in\\partial\\Omega\.\(22\)
whereuexact\(x,y\)u\_\{\\text\{exact\}\}\(x,y\)denotes the analytical solution and∂Ω\\partial\\Omegadenotes the boundary of the domainΩ\\Omega\.
#### Exact Solution
This case is one we constructed, and we define its analytical solutionu\(x,y\)u\(x,y\)as the sum of the low\-frequency and high\-frequency components:
uexact\(x,y\)=u0\(x,y\)\+up\(x,y\),u\_\{exact\}\(x,y\)=u\_\{0\}\(x,y\)\+u\_\{p\}\(x,y\),\(23\)
where the low\-frequency partu0\(x,y\)u\_\{0\}\(x,y\)is:
u0\(x,y\)=sin\(πx\)sin\(πy\),u\_\{0\}\(x,y\)=\\sin\(\\pi x\)\\sin\(\\pi y\),\(24\)
and the high\-frequency partup\(x,y\)u\_\{p\}\(x,y\)is:
up\(x,y\)=∑m=1MamGm\(x,y\)S\(x,y\),M=4,u\_\{p\}\(x,y\)=\\sum\_\{m=1\}^\{M\}a\_\{m\}G\_\{m\}\(x,y\)S\(x,y\),\\qquad M=4,\(25\)
with the functionS\(x,y\)S\(x,y\)defined as:
S\(x,y\)=sin\(kπx\)sin\(kπy\),S\(x,y\)=\\sin\(k\\pi x\)\\sin\(k\\pi y\),\(26\)
andGm\(x,y\)G\_\{m\}\(x,y\)representing a Gaussian function centered at\(cx,m,cy,m\)\(c\_\{x,m\},c\_\{y,m\}\):
Gm\(x,y\)=exp\(−\(x−cx,m\)2\+\(y−cy,m\)2σ2\)\.G\_\{m\}\(x,y\)=\\exp\\left\(\-\\frac\{\(x\-c\_\{x,m\}\)^\{2\}\+\(y\-c\_\{y,m\}\)^\{2\}\}\{\\sigma^\{2\}\}\\right\)\.\(27\)
#### Parameter Settings
The parameters used in the 2D Poisson equation are as follows:
The frequency parameter iskkand the width of the Gaussian functions isσ\\sigma, given by:
k=14,σ=0\.1,k=14,\\quad\\sigma=0\.1,
The centers of the Gaussians are located at:
\{\(cx,m,cy,m\)\}m=14=\{\(0\.30,0\.30\),\(0\.70,0\.35\),\(0\.55,0\.75\),\(0\.80,0\.80\)\},\\\{\(c\_\{x,m\},c\_\{y,m\}\)\\\}\_\{m=1\}^\{4\}=\\\{\(0\.30,0\.30\),\(0\.70,0\.35\),\(0\.55,0\.75\),\(0\.80,0\.80\)\\\},
The coefficientsama\_\{m\}are:
\(am\)m=14=\(5\.0,5\.0,5\.0,5\.0\)\.\(a\_\{m\}\)\_\{m=1\}^\{4\}=\(5\.0,5\.0,5\.0,5\.0\)\.
## B\.3 2D ADR Equation
Consider the steady\-state advection\-diffusion\-reaction \(ADR\) equation, which includes advection, reaction, and diffusion terms\. The equation is given by:
−νΔu\(x,y\)\+b\(x,y\)⋅∇u\(x,y\)\+c\(x,y\)u\(x,y\)=f\(x,y\),\(x,y\)∈Ω,\-\\nu\\Delta u\(x,y\)\+b\(x,y\)\\cdot\\nabla u\(x,y\)\+c\(x,y\)\\,u\(x,y\)=f\(x,y\),\\qquad\(x,y\)\\in\\Omega,\(28\)
where the domainΩ\\Omegais:
Ω=\(0,1\)×\(0,1\),\\Omega=\(0,1\)\\times\(0,1\),\(29\)
The advection termb\(x,y\)b\(x,y\)is defined as:
𝒃\(x,y\)=\(bx\(x,y\)by\(x,y\)\)=β\(−\(y−0\.5\)x−0\.5\),\\boldsymbol\{b\}\(x,y\)=\\begin\{pmatrix\}b\_\{x\}\(x,y\)\\\\\[2\.0pt\] b\_\{y\}\(x,y\)\\end\{pmatrix\}=\\beta\\begin\{pmatrix\}\-\(y\-0\.5\)\\\\ x\-0\.5\\end\{pmatrix\},\(30\)
The reaction termc\(x,y\)c\(x,y\)is:
c\(x,y\)=cmin\+cmaxexp\(−\(x−xc\)2\+\(y−yc\)2σc2\),c\(x,y\)=c\_\{\\min\}\+c\_\{\\max\}\\exp\\left\(\-\\frac\{\(x\-x\_\{c\}\)^\{2\}\+\(y\-y\_\{c\}\)^\{2\}\}\{\\sigma\_\{c\}^\{2\}\}\\right\),\(31\)
Dirichlet boundary conditions are given by:
u\(x,y\)=uexact\(x,y\),∀\(x,y\)∈∂Ω\.u\(x,y\)=u\_\{\\text\{exact\}\}\(x,y\),\\quad\\forall\(x,y\)\\in\\partial\\Omega\.\(32\)
whereuexact\(x,y\)u\_\{\\text\{exact\}\}\(x,y\)denotes the analytical solution and∂Ω\\partial\\Omegadenotes the boundary of the domainΩ\\Omega\.
#### Exact Solution
The solutionu\(x,y\)u\(x,y\)we constructed is expressed as the sum of the low\-frequency partulow\(x,y\)u\_\{\\text\{low\}\}\(x,y\)and a perturbation termδ⋅upacket\(x,y\)\\delta\\cdot u\_\{\\text\{packet\}\}\(x,y\):
uexact\(x,y\)=ulow\(x,y\)\+δ⋅,upacket\(x,y\),u\_\{exact\}\(x,y\)=u\_\{\\text\{low\}\}\(x,y\)\+\\delta\\cdot,u\_\{\\text\{packet\}\}\(x,y\),\(33\)
The low\-frequency partulow\(x,y\)u\_\{\\text\{low\}\}\(x,y\)is:
ulow\(x,y\)=sin\(πx\)sin\(πy\),u\_\{\\text\{low\}\}\(x,y\)=\\sin\(\\pi x\)\\sin\(\\pi y\),\(34\)
The perturbation termupacket\(x,y\)u\_\{\\text\{packet\}\}\(x,y\)is:
upacket\(x,y\)=exp\(−\(x−x0\)2\+\(y−y0\)2σu2\)cos\(kπx\),u\_\{\\text\{packet\}\}\(x,y\)=\\exp\\left\(\-\\frac\{\(x\-x\_\{0\}\)^\{2\}\+\(y\-y\_\{0\}\)^\{2\}\}\{\\sigma\_\{u\}^\{2\}\}\\right\)\\cos\(k\\pi x\),\(35\)
#### Parameter Settings
The parameters used in the 2D ADR equation are as follows:
ν=10−4,β=100,k=8,δ=0\.35,\\nu=10^\{\-4\},\\quad\\beta=100,\\quad k=8,\\quad\\delta=0\.35,
\(x0,y0\)=\(0\.60,0\.40\),\(xc,yc\)=\(0\.65,0\.35\),\(x\_\{0\},y\_\{0\}\)=\(0\.60,0\.40\),\\quad\(x\_\{c\},y\_\{c\}\)=\(0\.65,0\.35\),
σu=0\.12,σc=0\.10,cmin=1,cmax=104\.\\sigma\_\{u\}=0\.12,\\quad\\sigma\_\{c\}=0\.10,\\quad c\_\{\\min\}=1,\\quad c\_\{\\max\}=10^\{4\}\.
## B\.4 2D NS Equations for Lid\-Driven Cavity Flow
Consider the steady\-state Navier\-Stokes \(NS\) equations for the 2D lid\-driven cavity flow problem\. There is no analytical solution for this problem, and the reference solution is obtained using COMSOL software\. The problem is defined on the domainΩ\\Omegaas:
Ω=\[0,1\]×\[0,1\]\\Omega=\[0,1\]\\times\[0,1\]\(36\)
The steady\-state Navier\-Stokes equations are given by:
uux\+vuy\+px−1Re\(uxx\+uyy\)=0,\(x,y\)∈Ω,uu\_\{x\}\+vu\_\{y\}\+p\_\{x\}\-\\frac\{1\}\{\\text\{Re\}\}\(u\_\{xx\}\+u\_\{yy\}\)=0,\\qquad\(x,y\)\\in\\Omega,\(37\)
uvx\+vvy\+py−1Re\(vxx\+vyy\)=0,\(x,y\)∈Ω,uv\_\{x\}\+vv\_\{y\}\+p\_\{y\}\-\\frac\{1\}\{\\text\{Re\}\}\(v\_\{xx\}\+v\_\{yy\}\)=0,\\qquad\(x,y\)\\in\\Omega,\(38\)
ux\+vy=0,\(x,y\)∈Ω\.u\_\{x\}\+v\_\{y\}=0,\\qquad\(x,y\)\\in\\Omega\.\(39\)
The velocity boundary conditions are as follows:
u\(x,1\)=ax\(1−x\),v\(x,1\)=0,x∈\[0,1\],u\(x,1\)=ax\(1\-x\),\\qquad v\(x,1\)=0,\\qquad x\\in\[0,1\],\(40\)
u\(x,y\)=0,v\(x,y\)=0,\(x,y\)∈∂Ω∖\{y=1\}\.u\(x,y\)=0,\\qquad v\(x,y\)=0,\\qquad\(x,y\)\\in\\partial\\Omega\\setminus\\\{y=1\\\}\.\(41\)
The pressure boundary condition is:
#### Parameter Settings
The Reynolds number and lid velocity coefficient are solved set as:
Re=100,a=8\.\\text\{Re\}=100,\\quad a=8\.
## Appendix C: Experimental Parameter Settings and Details
This section provides the detailed experimental parameters and configuration used throughout the experiments, including the setup for different cases, training parameters, and configurations for the reweighting strategy\.
Table 6:Parameter settings for Case B\.1 under different frequency settingsParameterB\.1 \(low frequency\)B\.1 \(high frequency\)Number of domain points2501000Number of boundary points22Learning rate1×10−31\\times 10^\{\-3\}1×10−31\\times 10^\{\-3\}Adam iterations1000010000L\-BFGS iterations06000Optimizer decay2000, 0\.92000, 0\.9Network width100100Network depth66Table 7:Reweighting configuration for Case B\.1 under different frequency settingsParameterB\.1 \(low frequency\)B\.1 \(high frequency\)Decay epsilon1\.01\.0Number of subregions55Reweight every500500Reweight causal steps1000 \- 50001000 \- 8000Low frequency data weight0\.010\.0Table 8:Parameter settings for Cases B\.2–B\.4ParameterCase B\.2Case B\.3Case B\.4Number of domain points10000100002500Number of boundary points400400400Learning rate1×10−31\\times 10^\{\-3\}5×10−45\\times 10^\{\-4\}1×10−31\\times 10^\{\-3\}Adam iterations500050005000L\-BFGS iterations1500050003000Optimizer decay1000, 0\.91000, 0\.91000, 0\.9Network width505060Network depth656Table 9:Reweighting configuration for Cases B\.2–B\.4ParameterCase B\.2Case B\.3Case B\.4Decay epsilon1\.00\.50\.5Number of subregions\[5, 5\]\[5, 5\]\[5, 5\]Reweight every1000500500Reweight causal steps1000 \- 90001000 \- 50001000 \- 3000Reweight adaptive steps10000 \- 150005000 \- 80006000 \- 8000Low frequency data weight10\.010\.010\.0Reweight adaptive scale3\.03\.03\.0Grad norms scaleλg\\lambda\_\{g\}0\.52\.00\.5
## References
- \[1\]W\. F\. Ames\(2014\)Numerical methods for partial differential equations\.Academic press\.Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p1.1.1)\.
- \[2\]S\. L\. Brunton and J\. N\. Kutz\(2024\)Promising directions of machine learning for partial differential equations\.Nature Computational Science4\(7\),pp\. 483–494\.Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p2.1.1)\.
- \[3\]F\. Cao, K\. Fan, H\. Zhang, D\. Yuan, and J\. Liu\(2025\)Adaptive residual splitting in pinns for solving complex pdes\.Journal of Computational Physics,pp\. 114297\.Cited by:[§2\.2](https://arxiv.org/html/2605.15254#S2.SS2.p1.1.2),[§3\.3](https://arxiv.org/html/2605.15254#S3.SS3.SSS0.Px1.p2.1.1)\.
- \[4\]F\. Cao, F\. Gao, D\. Yuan, and J\. Liu\(2024\)Multistep asymptotic pre\-training strategy based on pinns for solving steep boundary singular perturbation problems\.Computer Methods in Applied Mechanics and Engineering431,pp\. 117222\.Cited by:[§2\.3](https://arxiv.org/html/2605.15254#S2.SS3.p1.1.9)\.
- \[5\]F\. Cao, X\. Guo, X\. Dong, and D\. Yuan\(2025\)WbPINN: weight balanced physics\-informed neural networks for multi\-objective learning\.Applied Soft Computing170,pp\. 112632\.Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p6.1.8)\.
- \[6\]W\. Cao and W\. Zhang\(2023\)TSONN: time\-stepping\-oriented neural network for solving partial differential equations\.arXiv preprint arXiv:2310\.16491\.Cited by:[§2\.3](https://arxiv.org/html/2605.15254#S2.SS3.p1.1.2)\.
- \[7\]Classification of partial differential equations\.InAn Introduction to Second Order Partial Differential Equations,pp\. 27–47\.External Links:[Document](https://dx.doi.org/10.1142/9789813229181%5F0002),[Link](https://www.worldscientific.com/doi/abs/10.1142/9789813229181_0002),https://www\.worldscientific\.com/doi/pdf/10\.1142/9789813229181\_0002Cited by:[A\.1 PDE Types](https://arxiv.org/html/2605.15254#Sx1.SSx1.p1.1.1)\.
- \[8\]A\. Daw, J\. Bu, S\. Wang, P\. Perdikaris, and A\. Karpatne\(2022\)Mitigating propagation failures in physics\-informed neural networks using retain\-resample\-release \(r3\) sampling\.arXiv preprint arXiv:2207\.02338\.Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p6.1.2),[§2\.1](https://arxiv.org/html/2605.15254#S2.SS1.p1.1.6)\.
- \[9\]C\. R\. Doering and J\. D\. Gibbon\(1995\)Applied analysis of the navier\-stokes equations\.Cambridge Texts in Applied Mathematics,Cambridge University Press\.Cited by:[A\.2 Physical Perspective](https://arxiv.org/html/2605.15254#Sx1.SSx2.p3.1.1)\.
- \[10\]V\. Dolean, A\. Heinlein, S\. Mishra, and B\. Moseley\(2024\)Multilevel domain decomposition\-based architectures for physics\-informed neural networks\.Computer Methods in Applied Mechanics and Engineering429,pp\. 117116\.Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p6.1.5)\.
- \[11\]C\. Duffy and G\. V\. Velikova\(2024\)Dynamic curriculum regularization for enhanced training of physics\-informed neural networks\.InNeurIPS 2024 Workshop on Machine Learning and the Physical Sciences \(ML4PS\),External Links:[Link](https://ml4physicalsciences.github.io/2024/files/NeurIPS_ML4PS_2024_109.pdf)Cited by:[§2\.2](https://arxiv.org/html/2605.15254#S2.SS2.p1.1.5)\.
- \[12\]L\. C\. Evans\(2022\)Partial differential equations\.Vol\.19,American mathematical society\.Cited by:[Appendix A: Analysis of PDE Problem Characteristics](https://arxiv.org/html/2605.15254#Sx1.p1.1.1)\.
- \[13\]W\. Fan and X\. Chen\(2026\)Embedding physics into machine learning: a review of physics informed neural networks as partial differential equation forward solvers\.Tsinghua Science and Technology31\(3\),pp\. 1326–1364\.Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p2.1.4)\.
- \[14\]D\. Gilbarg, N\. S\. Trudinger, D\. Gilbarg, and N\. Trudinger\(1998\)Elliptic partial differential equations of second order\.Vol\.2,Springer\.Cited by:[A\.3 Boundary Conditions and Problem Forms](https://arxiv.org/html/2605.15254#Sx1.SSx3.p1.1.1)\.
- \[15\]J\. Guo, Z\. Liu, H\. Wang, and C\. Hou\(2025\)Dual\-level time\-marching neural network for solving time\-dependent partial differential equations: j\. guo et al\.\.Nonlinear Dynamics113\(20\),pp\. 27915–27939\.Cited by:[§2\.3](https://arxiv.org/html/2605.15254#S2.SS3.p1.1.8)\.
- \[16\]J\. Guo, Y\. Yao, H\. Wang, and T\. Gu\(2023\)Pre\-training strategy for solving evolution equations based on physics\-informed neural networks\.Journal of Computational Physics489,pp\. 112258\.Cited by:[§2\.3](https://arxiv.org/html/2605.15254#S2.SS3.p1.1.1)\.
- \[17\]Y\. Guo, Z\. Fu, J\. Min, S\. Lin, X\. Liu, Y\. F\. Rashed, and X\. Zhuang\(2025\)Long\-term simulation of physical and mechanical behaviors using curriculum\-transfer\-learning based physics\-informed neural networks\.arXiv preprint arXiv:2502\.07325\.Cited by:[§2\.3](https://arxiv.org/html/2605.15254#S2.SS3.p1.1.8)\.
- \[18\]J\. Huang, R\. You, and T\. Zhou\(2025\)Frequency\-adaptive multi\-scale deep neural networks\.Computer Methods in Applied Mechanics and Engineering437,pp\. 117751\.Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p6.1.4)\.
- \[19\]M\. Jahani\-Nasab and M\. A\. Bijarchi\(2024\)Enhancing convergence speed with feature enforcing physics\-informed neural networks using boundary conditions as prior knowledge\.Scientific Reports14\(1\),pp\. 23836\.Cited by:[§2\.2](https://arxiv.org/html/2605.15254#S2.SS2.p1.1.3)\.
- \[20\]A\. Krishnapriyan, A\. Gholami, S\. Zhe, R\. Kirby, and M\. W\. Mahoney\(2021\)Characterizing possible failure modes in physics\-informed neural networks\.Advances in neural information processing systems34,pp\. 26548–26560\.Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p4.1.1),[§2\.2](https://arxiv.org/html/2605.15254#S2.SS2.p1.1.1)\.
- \[21\]S\. Lin and Y\. Chen\(2025\)Causality\-guided adaptive sampling method for physics\-informed neural networks solving forward problems of partial differential equations\.Physica D: Nonlinear Phenomena,pp\. 134878\.Cited by:[§2\.3](https://arxiv.org/html/2605.15254#S2.SS3.p1.1.7)\.
- \[22\]Q\. Liu, M\. Chu, and N\. Thuerey\(2025\)ConFIG: towards conflict\-free training of physics informed neural networks\.InThe Thirteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=APojAzJQiq)Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p6.1.3)\.
- \[23\]R\. Mattey and S\. Ghosh\(2022\)A novel sequential method to train physics informed neural networks for allen cahn and cahn hilliard equations\.Computer Methods in Applied Mechanics and Engineering390,pp\. 114474\.Cited by:[§2\.3](https://arxiv.org/html/2605.15254#S2.SS3.p1.1.4)\.
- \[24\]X\. Meng, Z\. Li, D\. Zhang, and G\. E\. Karniadakis\(2020\)PPINN: parareal physics\-informed neural network for time\-dependent pdes\.Computer Methods in Applied Mechanics and Engineering370,pp\. 113250\.Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p6.1.6)\.
- \[25\]S\. Monaco and D\. Apiletti\(2023\)Training physics\-informed neural networks: one learning to rule them all?\.Results in Engineering18,pp\. 101023\.Cited by:[§2\.2](https://arxiv.org/html/2605.15254#S2.SS2.p1.1.7)\.
- \[26\]M\. Münzer and C\. Bard\(2022\)A curriculum\-training\-based strategy for distributing collocation points during physics\-informed neural network training\.arXiv preprint arXiv:2211\.11396\.Cited by:[§2\.2](https://arxiv.org/html/2605.15254#S2.SS2.p1.1.6)\.
- \[27\]M\. Penwarden, A\. D\. Jagtap, S\. Zhe, G\. E\. Karniadakis, and R\. M\. Kirby\(2023\)A unified scalable framework for causal sweeping strategies for physics\-informed neural networks \(pinns\) and their temporal decompositions\.Journal of Computational Physics493,pp\. 112464\.Cited by:[§2\.1](https://arxiv.org/html/2605.15254#S2.SS1.p1.1.5),[§2\.3](https://arxiv.org/html/2605.15254#S2.SS3.p1.1.6)\.
- \[28\]M\. Raissi, P\. Perdikaris, and G\. E\. Karniadakis\(2019\)Physics\-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations\.Journal of Computational physics378,pp\. 686–707\.Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p2.1.2)\.
- \[29\]P\. Rathore, W\. Lei, Z\. Frangella, L\. Lu, and M\. Udell\(2024\)Challenges in training pinns: a loss landscape perspective\.arXiv preprint arXiv:2402\.01868\.Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p4.1.1)\.
- \[30\]M\. M\. Tariq, M\. B\. Riaz, S\. S\. Kazmi, and M\. Aziz\-ur\-Rehman\(2025\)Unveiling chaos and stability in advection diffusion reaction systems via advanced dynamical and sensitivity analysis\.Scientific Reports15\(1\),pp\. 5513\.Cited by:[A\.2 Physical Perspective](https://arxiv.org/html/2605.15254#Sx1.SSx2.p1.1.1)\.
- \[31\]Y\. Tsai, H\. Juan, P\. Chiu, and C\. Lin\(2025\)MLD\-pinn: a multi\-level datasets training method in physics\-informed neural networks\.Computers & Fluids,pp\. 106849\.Cited by:[§2\.2](https://arxiv.org/html/2605.15254#S2.SS2.p1.1.4)\.
- \[32\]S\. Wang, S\. Sankaran, and P\. Perdikaris\(2024\)Respecting causality for training physics\-informed neural networks\.Computer Methods in Applied Mechanics and Engineering421,pp\. 116813\.External Links:ISSN 0045\-7825,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.cma.2024.116813),[Link](https://www.sciencedirect.com/science/article/pii/S0045782524000690)Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p5.1.2),[§2\.3](https://arxiv.org/html/2605.15254#S2.SS3.p1.1.9),[§3\.1](https://arxiv.org/html/2605.15254#S3.SS1.SSS0.Px2.p1.1.1)\.
- \[33\]S\. Wang, X\. Yu, and P\. Perdikaris\(2022\)When and why pinns fail to train: a neural tangent kernel perspective\.Journal of Computational Physics449,pp\. 110768\.Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p4.1.1),[§1](https://arxiv.org/html/2605.15254#S1.p6.1.1),[§3\.3](https://arxiv.org/html/2605.15254#S3.SS3.SSS0.Px1.p3.8.1)\.
- \[34\]X\. Wang, Y\. Chen, and W\. Zhu\(2021\)A survey on curriculum learning\.IEEE transactions on pattern analysis and machine intelligence44\(9\),pp\. 4555–4576\.Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p5.1.1)\.
- \[35\]Y\. Wang, Y\. Yao, and Z\. Gao\(2025\)An extrapolation\-driven network architecture for physics\-informed deep learning\.Neural Networks183,pp\. 106998\.Cited by:[§2\.1](https://arxiv.org/html/2605.15254#S2.SS1.p1.1.7)\.
- \[36\]T\. Yang, Z\. Qian, N\. Hang, and M\. Liu\(2025\)S\-pinn: stabilized physics\-informed neural networks for alleviating barriers between multi\-level co\-optimization\.Computer Methods in Applied Mechanics and Engineering447,pp\. 118348\.Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p6.1.7)\.
- \[37\]W\. Zhang, W\. Suo, J\. Song, and W\. Cao\(2026\)Physics\-informed neural networks \(pinns\) as intelligent computing technique for solving partial differential equations: limitation and future prospects\.SCIENCE CHINA Physics, Mechanics & Astronomy69\(1\),pp\. 214602\.Cited by:[§1](https://arxiv.org/html/2605.15254#S1.p2.1.3)\.Similar Articles
Physics-Informed Neural Networks with Learnable Loss Balancing and Transfer Learning
This paper proposes a self-supervised physics-informed neural network (PINN) framework with a learnable blending neuron to adaptively balance physics-based and data-driven losses, and integrates transfer learning to improve efficiency under data scarcity. It is validated on liquid-metal miniature heat sink CFD data with only 87 datapoints, achieving under 8% error.
A PAC-Bayesian View of Generalisation for Physics-Informed Machine Learning
This paper develops a PAC-Bayesian framework for physics-informed machine learning, providing high-probability generalization guarantees for unbounded losses. It proposes a multi-task perspective that jointly handles data fidelity, PDE residuals, and boundary conditions, and introduces a self-bounding learning algorithm.
PIMSM: Physics-Informed Multi-Scale Mamba for Stable Neural Representations under Distribution Shift
This paper proposes Physics-Informed Multi-Scale Mamba (PIMSM), a state-space architecture that aligns model memory with physical timescales to improve robustness under distribution shift in scientific time series, demonstrating improvements on fMRI and weather forecasting tasks.
Fourier Feature Pyramids for Physics-Informed Neural Networks
The paper introduces beignet, a PINN architecture that replaces random Fourier features with a trainable multi-resolution Fourier feature pyramid, achieving higher accuracy and computational efficiency on PDE benchmarks.
Finite Volume-Informed Neural Network Framework for 2D Shallow Water Equations: Rugged Loss Landscapes and the Importance of Data Guidance
This paper introduces 'Data-Guided FVM-PINN', a framework using finite-volume losses for 2D shallow water equations, demonstrating that sparse data guidance is crucial to prevent network collapse in rugged loss landscapes.