Loss Landscape Diagnosis for Gradient-Based Gray-Scott System Inversion: Disentangling the Roles of PINN Components
Summary
This paper diagnoses the loss landscape of gradient-based inversion for the Gray-Scott reaction-diffusion system, showing that direct backpropagation fails due to flat plateaus and sharp cliffs, while PINN components like residual loss smooth the landscape. The findings provide design implications for PINN-type methods.
View Cached Full Text
Cached at: 06/11/26, 01:45 PM
# Loss Landscape Diagnosis for Gradient-Based Gray-Scott System Inversion: Disentangling the Roles of PINN Components
Source: [https://arxiv.org/html/2606.11258](https://arxiv.org/html/2606.11258)
###### Abstract
Gradient\-based inversion of reaction\-diffusion systems is typically approached via surrogate models or physics\-informed neural networks \(PINNs\), while the most direct route, backpropagation through the PDE’s structure itself, has largely been avoided\. We pursue this direct route as a diagnostic probe, backpropagating a steady\-state loss through unrolled Gray\-Scott simulation to recover its parameters, with no surrogate or neural\-network augmentation\. Optimization fails to converge, and plotting the landscape directly locates the failure in its geometry—flat plateaus with no gradient signal, bounded by sharp cliffs that align with bifurcation boundaries—a structure that recurs across loss functions and is inherited however the gradients are routed to parameters\. Reading this minimal setup as an ablation of PINN, we disentangle each component’s role: with the neural network fixed, the residual loss is quadratic in the PDE parameters and yields a smooth landscape, so it alone already avoids the pathology, by implicitly encoding the full PDE dynamics across all initial conditions\. The neural network, for its part, cannot repair an ill\-posed parameter subspace, and so serves only to complete the observed data—a division of labor not previously made explicit\. These findings carry concrete design implications for PINN\-type methods and a broader heuristic on when added dimensions actually help\.
ICML, physics\-informed neural network, machine learning, Gray\-Scott system, reaction\-diffusion, inverse problem, loss landscape, backpropagation, gradient\-based optimization, parameter estimation, bifurcation
## 1Introduction
Inverse parameter estimation that recovers the governing parameters of a dynamical system from observed outputs arises across scientific domains, from developmental biology\(Kondo,[2022](https://arxiv.org/html/2606.11258#bib.bib2)\)to computational neuroscience\(Lefèvre and Mangin,[2010](https://arxiv.org/html/2606.11258#bib.bib1)\)\. Reaction\-diffusion systems are a canonical class of such problems: their parameters determine qualitatively distinct pattern\-forming behaviors, and inferring them from steady\-state or non\-terminal observations has direct relevance to both physical modeling and biological pattern analysis\(Kondo,[2022](https://arxiv.org/html/2606.11258#bib.bib2)\)\.
Direct backpropagation is the most fundamental optimization mechanism in machine learning, more efficient in terms of information flow than indirect approaches such as evolutionary strategies, surrogate\-based methods, or reinforcement learning\. However, practical applications of machine learning to dynamical system inversion in physics have largely avoided this direct route, favoring surrogate models\(Schnörr and Schnörr,[2023](https://arxiv.org/html/2606.11258#bib.bib12)\)or neural\-network\-augmented approaches such as PINNs\(Raissiet al\.,[2019](https://arxiv.org/html/2606.11258#bib.bib4)\)\. We assume this is because the parameter\-to\-solution map of nonlinear reaction\-diffusion systems is expected to involve irregularity\. However, the specific challenges faced by direct gradient\-based methods in this setting—how pathological the loss landscape111The term*loss landscape*in deep learning typically refers to the high\-dimensional parameter space of neural networks, analyzed with tools such as Hessian eigenspectra or random\-direction projections\. We use it more broadly, for the geometry of the loss over whatever parameter space is under investigation—here either the low\-dimensional PDE parameter subspace or the joint space that also includes the neural network parameters\.is, whether it can be transformed into a well\-behaved equivalence, whether backpropagation makes any progress, and whether existing approaches target the actual obstacles—have not been systematically investigated\. To answer these questions, we pursue this route in a minimalist form: backpropagation through unrolled simulation steps, without surrogate approximations or neural network augmentations\. We choose the grid\-searchable four\-parameter Gray\-Scott system as a fully inspectable testbed to investigate the behavior and outcomes of gradient\-based optimization, aiming for conclusions that extend to broader reaction\-diffusion and PDE inverse problems\.
Empirical results reveal flat plateaus with negligible gradient signal and sharp cliffs that obstruct navigation in the loss landscape, and that backpropagation makes no reliable progress without deliberate intervention\. Read as an ablation of PINN, these results led us to disentangle the roles of its components: the residual loss yields a well\-behaved landscape by encoding full PDE dynamics—unlike the limited parameter\-to\-solution map probed here—while the neural network is unable to improve the parameter landscape, only serving to fill in the missing observations\. These findings carry direct design implications for PINN and broader neural\-network\-enhanced inverse approaches\.
## 2Diagnostic Setup
### 2\.1Forward Model and Backpropagation
The Gray\-Scott reaction\-diffusion system models the spatiotemporal evolution of two chemical speciesuuandvvthrough the following equations\(Delgadoet al\.,[2017](https://arxiv.org/html/2606.11258#bib.bib10); Gandy and Nelson,[2022](https://arxiv.org/html/2606.11258#bib.bib11)\):
∂u∂t\\displaystyle\\frac\{\\partial u\}\{\\partial t\}=DuΔu−uv2\+F\(1−u\)\\displaystyle=D\_\{u\}\\Delta u\-uv^\{2\}\+F\(1\-u\)\\\(1\)∂v∂t\\displaystyle\\frac\{\\partial v\}\{\\partial t\}=DvΔv\+uv2−\(F\+k\)v,\\displaystyle=D\_\{v\}\\Delta v\+uv^\{2\}\-\(F\+k\)\\,v,\(2\)whereDuD\_\{u\}andDvD\_\{v\}are diffusion coefficients,FFis the feed rate, andkkis the kill rate\. Depending on these parameters, the system exhibits qualitatively distinct steady\-state patterns, ranging from uniform solutions to spatially structured patterns such as spots, stripes, and labyrinthine structures\(Kondo,[2022](https://arxiv.org/html/2606.11258#bib.bib2)\)\. The sensitivity of steady\-state pattern type to parameter values is central to the inverse problem we study\.
The system we investigate is set to back propagate a loss through the intact structure of a time\-unrolled stepping algorithm of the Gray\-Scott model\. With the time step and spatial grid spacing both set to unity \(Δt=Δx=Δy=1\.0\\Delta t=\\Delta x=\\Delta y=1\.0\), each step follows:
unext=u\+DuΔu−uv2\+F\(1−u\)\\displaystyle u\_\{next\}=u\+D\_\{u\}\\Delta u\-uv^\{2\}\+F\(1\-u\)\\\(3\)vnext=v\+DvΔv\+uv2−\(F\+k\)v,\\displaystyle v\_\{next\}=v\+D\_\{v\}\\Delta v\+uv^\{2\}\-\(F\+k\)\\,v,\(4\)loss defined on howvvdeviates from targets\.
We use this backpropagation\-based optimization to fitDuD\_\{u\},DvD\_\{v\},FF, andkkfor the distribution of a batch of512512similar target patterns sized128×128128\\times 128, with variations coming from the limited randomness in their initial conditions \(IC; see Appendix[AppendixA](https://arxiv.org/html/2606.11258#A1)\)\. Backpropagation is done through all unrolled steps without truncation, as computational resource was sufficient under the non\-truncated setting, and no gradient explosion/vanishing was observed\.
#### Target patterns\.
Target patterns are generated using the same time stepping functions, boundary conditions, and initial conditions as training, fixing parameters asDu=0\.16D\_\{u\}=0\.16,Dv=0\.08D\_\{v\}=0\.08,F=0\.035F=0\.035, andk=0\.065k=0\.065, until no pixel differs from the last step more than10−410^\{\-4\}or until we reach the maximum step5000050000\.
#### Optimization objective\.
We calculate the 2D power spectrum of the generated pattern and a sampled target pattern using
S=log\(fftshift\(\|ℱ\(v~\)\|2\+ϵ\)\)∈ℝH×W,S=\\log\\\!\\left\(\\mathrm\{fftshift\}\\\!\\left\(\\,\|\\mathcal\{F\}\(\\tilde\{v\}\)\|^\{2\}\+\\epsilon\\right\)\\right\)\\in\\mathbb\{R\}^\{H\\times W\},\(5\)where
v~=v−1HW∑i,jvi,j\.\\tilde\{v\}=v\-\\frac\{1\}\{HW\}\\sum\_\{i,j\}v\_\{i,j\}\.\(6\)Then, we use L2 loss betweenStargetS\_\{target\}andSgeneratedS\_\{generated\}\.
We also use an “windowed” 2D power spectrum, which calculates the same 2D power spectrum, but in each of the646416×1616\\times 16windows of a128×128128\\times 128image\. The6464spectrum windows for each image are then reassembled into a data matrix of the original image size \(128×128128\\times 128\)\. A windowed loss then equals the L2 loss applied to the reassembled matrices, between target and generated\.

Figure 1:Sampled steady\-state patterns and their non\-windowed and windowed 2D power spectrum losses\. Each x\-axis panel shows the corresponding steady\-state pattern\.333The colormap is clipped to\[0\.0,0\.1\]\[0\.0,0\.1\]to enhance contrast, as thevvcomponent of the Gray\-Scott model concentrates in this range under the tested parameter regimes\.\#1:from initial parameters\.\#2–3:pivoting points \(after 8918 and a further 98 iterations; see Appendix[AppendixD](https://arxiv.org/html/2606.11258#A4)\)\.\#4:ground truth \(highlighted\)\.\#5–7:representative training samples\.
### 2\.2Safeguards and Parameter Constraints
1. 1\.Our adaptive learning rate shrinks the learning rate until the next parameters does not generateNaN\\mathrm\{NaN\},Inf\\mathrm\{Inf\}, or non\-\[0,1\]\[0,1\]forvv, completely avoiding these values\.
2. 2\.We restrict ranges of parametersDuD\_\{u\},DvD\_\{v\},FF,kk, by setting them assoftplus\(𝑙𝑜𝑔\_Du\)\\mathrm\{softplus\}\(\\mathit\{log\\\_D\_\{u\}\}\),softplus\(𝑙𝑜𝑔\_Dv\)\\mathrm\{softplus\}\(\\mathit\{log\\\_D\_\{v\}\}\),0\.1⋅sigmoid\(𝑟𝑎𝑤\_k\)0\.1\\cdot\\mathrm\{sigmoid\}\(\\mathit\{raw\\\_k\}\), and0\.1⋅sigmoid\(𝑟𝑎𝑤\_F\)0\.1\\cdot\\mathrm\{sigmoid\}\(\\mathit\{raw\\\_F\}\), adding another layer of parameters𝑙𝑜𝑔\_Du\\mathit\{log\\\_D\_\{u\}\},𝑙𝑜𝑔\_Dv\\mathit\{log\\\_D\_\{v\}\},𝑟𝑎𝑤\_k\\mathit\{raw\\\_k\},𝑟𝑎𝑤\_F\\mathit\{raw\\\_F\}at the entry of the network\.
3. 3\.The empirical range ofDuD\_\{u\}andDvD\_\{v\}satisfies the 2\-dimensional CFL criterion:DΔt\(1Δx2\+1Δy2\)≤12D\\Delta t\\left\(\\frac\{1\}\{\\Delta x^\{2\}\}\+\\frac\{1\}\{\\Delta y^\{2\}\}\\right\)\\leq\\frac\{1\}\{2\}\. WithΔx=1\.0\\Delta x=1\.0,Δy=1\.0\\Delta y=1\.0, andΔt=1\.0\\Delta t=1\.0, this means we haveD≤14D\\leq\\frac\{1\}\{4\}in most cases, although we do not take dedicated measures to strictly ensure it\.
4. 4\.The initial state generation mechanism \(Equations[9](https://arxiv.org/html/2606.11258#A1.E9)and[10](https://arxiv.org/html/2606.11258#A1.E10)\) is shared between target and training, so that our training goals are maximally simplified to focus on finding the four parametersDuD\_\{u\},DvD\_\{v\},FF, andkk\.
## 3Training Results
Gradient\-based optimization fails to converge under this setup, for reasons that turn out to lie in the loss landscape rather than in any particular loss function\. We establish this in the two subsections blow\.
### 3\.1Loss Values Carry No Convergence Signal
Across training, the loss stays within a narrow high band of 245\.0–270\.0 with no downward trend, departing from it only as sparse, isolated drops\. In every such excursion that we allowed to continue, the loss climbed back to the high band within a few iterations\.
A low loss, moreover, does not reliably indicate a correct fit\. Among the sampled configurations in[Footnote2](https://arxiv.org/html/2606.11258#footnote2), two reach comparably low loss, differing only marginally, but correspond to qualitatively different outcomes: one matches the target steady\-state pattern \(\#3\) while another does not \(\#7\)\.
The matching configuration \(\#3\) was found incidentally: we interrupted training at the moment its loss dropped, and only post\-hoc inspection revealed that it matched the target\. Because the run was cut off there, we did not observe whether its loss would have climbed back as every uninterrupted excursion did; we expect it would have\. The parameter trajectory reaching this configuration is recorded in Appendix[AppendixD](https://arxiv.org/html/2606.11258#A4)\.
Taken together, these observations show that the loss provides no usable signal for convergence: the low\-loss configurations we encountered were reached incidentally rather than by descent, and could not be distinguished from spurious low\-loss configurations by their loss value alone\.

Figure 2:Loss landscapes along single parameters\. From top to bottom: cross\-sections forkk,FF,DvD\_\{v\}, andDuD\_\{u\}\. All other parameters are fixed at their respective ground truth values\. TheDvD\_\{v\}plot is missing a left high\-loss region, and theDuD\_\{u\}plot missing a right one, both becausevvvalues diverge toInfat those ends, making loss computation infeasible\.
### 3\.2The Problem Persists Across Loss Functions
A natural hypothesis is that the loss function itself is at fault: the 2D power spectrum loss assigns low values to non\-target patterns \([Footnote2](https://arxiv.org/html/2606.11258#footnote2)\), which could both explain the spurious low\-loss excursions and suggest an easy fix\. To test this, we replaced it with the windowed 2D power spectrum loss \(described under “optimization objective” in[Section2\.1](https://arxiv.org/html/2606.11258#S2.SS1)\), which enforces that the dominant Fourier frequencies come equally from all sub\-regions of the 128×128 field\.
The windowed loss behaves no better\. It produces the same trendless fluctuation, now centered around 230, with no converging trend\. Evaluated on the seven parameter sets of[Footnote2](https://arxiv.org/html/2606.11258#footnote2), its values remain polarized: each pattern sits either marginally above the target loss or up in the uniform\-region range, with little in between\. The windowed loss does improve separability for some non\-matching patterns relative to the non\-windowed version—compare the two series in[Footnote2](https://arxiv.org/html/2606.11258#footnote2)—but this sharper discrimination at isolated points does not translate into a usable gradient between them\.
We can rule out insufficient exploration as the cause\. Step sizes were small enough that meaningful parameter change required hundreds of iterations, and training ran for tens of hours; the stagnation is not an artifact of too\-short or too\-coarse a search\. Because two structurally different loss functions produce the same polarized, trendless behavior, the obstruction is unlikely to reside in the loss function\. This points instead to the geometry of the loss landscape itself, which we examine directly in[Section4](https://arxiv.org/html/2606.11258#S4)\.
## 4Further Probing the Loss Landscapes
We now investigate possible issues in the loss landscape, regardless of loss functions\. To further understand the shapes of the loss landscapes, we plotted some cross\-sections of the landscape\.
### 4\.1Cross\-Sections: Single Parameters and Pairs of Parameters
We show the one\-dimensional cross\-sections alongkk,FF,DuD\_\{u\}, andDvD\_\{v\}in[Figure2](https://arxiv.org/html/2606.11258#S3.F2)\. Along each of these parameter variation directions, we also record a sequence of animations sweeping that direction\. The animation file links are in Appendix[AppendixE](https://arxiv.org/html/2606.11258#A5)\.
We show loss values in two\-dimensional planes formed by varying pairs of parameters in[Figure3](https://arxiv.org/html/2606.11258#S4.F3)\. Two such planes are presented, one forFF\-DvD\_\{v\}and the other forFF\-kk\.[Footnote5](https://arxiv.org/html/2606.11258#footnote5)shows steady\-state patterns arranged in a7×77\\times 7matrix that illustrate the sameFF\-kkregion, providing a more intuitive view of the trend underlying theFF\-kkloss plot\.
The cross\-sections reveal sharp cliffs separating distinct regions of the parameter space, with extensive flat areas on both sides providing negligible gradient signal\. These features are consistent across all four one\-dimensional axes and both two\-dimensional cross\-sections investigated\.
### 4\.2Cross\-Sections for Different Loss Functions
To verify whether the same issues occur across different loss functions, we compare three loss functions on the same cross\-section ofFF\-kk\. In[Figure3](https://arxiv.org/html/2606.11258#S4.F3), the bottom plot already shows such a landscape for non\-windowed 2D power spectrum loss, while[Figure5\(a\)](https://arxiv.org/html/2606.11258#S5.F5.sf1)and[Figure5\(b\)](https://arxiv.org/html/2606.11258#S5.F5.sf2)provide corresponding three\-dimensional plots for the non\-windowed and windowed versions\. These are qualitatively similar\.

Figure 3:Loss landscape cross\-sections on 2\-dimensional planes formed by varying pairs of parameters, both of50×5050\\times 50granularity\. Parameters not showing on a plot are fixed at their respective ground truth values\.Top:FF\-DvD\_\{v\}\.Bottom:FF\-kk\.
Figure 4:Steady\-state patterns behind theFF\-kkloss plot \(at the bottom of[Figure3](https://arxiv.org/html/2606.11258#S4.F3)\)\.555Same colormap clipping as in[Footnote2](https://arxiv.org/html/2606.11258#footnote2)\([Footnote3](https://arxiv.org/html/2606.11258#footnote3)\)We also evaluate a third loss function, the VGG\-based Gram matrix loss\. Specifically, both the generated and target patterns are passed through the first 18 layers of a pretrained VGG\-19 network\(Simonyan and Zisserman,[2015](https://arxiv.org/html/2606.11258#bib.bib8)\), and the loss is computed as the mean squared error between their respective Gram matrices in feature space\. This loss was originally developed for neural style transfer\(Gatyset al\.,[2016](https://arxiv.org/html/2606.11258#bib.bib9)\), where Gram matrix similarity captures the statistical distribution of features rather than pixel\-level correspondence\. Unlike power spectrum losses, which measure frequency\-domain similarity of the final patterns, the VGG Gram loss operates in a learned feature space sensitive to texture and structural regularity\. We probe its loss landscape to assess whether a loss function that captures higher\-level pattern statistics, rather than explicit frequency content, yields more navigable geometry\.
As we can see in[Figure5\(c\)](https://arxiv.org/html/2606.11258#S5.F5.sf3), the VGG loss differs from its power spectrum companions in three ways: 1\) values span a smaller range than the power spectrum versions \(0∼1000\\sim 100versus0∼200\+0\\sim 200\+\); 2\) the uniform\-solution plateau is at a mid\-level value, unlike the power spectrum losses where this plateau dominates at high values; and 3\) the pattern\-forming region exhibits loss fluctuations rather than remaining at low values as in the power spectrum cases\.
Despite these differences, sharp discontinuities still dominate the boundary between the uniform\-solution and pattern\-forming regions, and the plateaus retain negligible gradient signals\. The only navigable slope in the middle of the pattern\-forming region is isolated by sharply rising ridges on one side and steep cliffs leading to higher plateaus on the other side\. Illustrations from different angles in Appendix[AppendixB](https://arxiv.org/html/2606.11258#A2)\([Figure8](https://arxiv.org/html/2606.11258#A2.F8)\) show this more clearly\. The fluctuations in the pattern\-forming region reveal the VGG loss’s sensitivity to pattern differences\. However, these fluctuations introduce additional cliffs within the pattern\-forming region itself, making it equally unnavigable\.
## 5Discussions
### 5\.1Empirical Result Interpretation
All the loss landscape cross\-sections we plotted for[Section4](https://arxiv.org/html/2606.11258#S4)\([Figure2](https://arxiv.org/html/2606.11258#S3.F2),[Figure3](https://arxiv.org/html/2606.11258#S4.F3), and[Figure5](https://arxiv.org/html/2606.11258#S5.F5)\) are dominated by plateaus with negligible signals and sharp cliffs\. Typical cliffs separating pattern\-forming with uniform\-solution regions likely arise at bifurcations—the locations are surprisingly close to the bifurcations shown in Figure 2 ofDelgadoet al\.\([2017](https://arxiv.org/html/2606.11258#bib.bib10)\)and Figure 11 ofGandy and Nelson \([2022](https://arxiv.org/html/2606.11258#bib.bib11)\)\. They form similar cusps, and small location deviations are likely due to different settings of diffusion coefficients\. We speculate that, at saddle\-node bifurcations, the pattern\-forming solutions suddenly emerge, becoming the only or stronger attractor, while the uniform solutions dominate on the other side\. The Hopf bifurcation is also a candidate here—limit loops observed in ourvvanimations often remain uniform for a long time with occasional flashes of dynamical local patterns, which makes it tempting for our step algorithm to cut them off as uniform solutions\. Our optimizer was mostly confined to the uniform\-solution \(or quasi\-uniform\-solution limit\-loop\) regions, which explains the unchanging high losses observed during training\. Its very occasional excursions into the pattern\-forming region remain brief: gradient signals there are weak and noisy, and the bounding cliffs, though steep, are narrow enough that a single misguided step crosses them, pushing the optimizer back into the plateau\.
We reason that the gradient\-based approach inherits this problematic landscape regardless of how gradients from the steady\-state discrepancy are fed back to the parameters: whether via backpropagation through unrolled steps, implicit differentiation, or a forward surrogate that maps PDE parameters to steady\-state patterns\. Alternatives along these lines therefore do not change the nature of the problem\.

\(a\)Cross\-section for the non\-windowed 2D power spectrum loss\.

\(b\)Cross\-section for the windowed 2D power spectrum loss\.

\(c\)Cross\-section for the VGG\-based Gram matrix loss\.
Figure 5:2\-Dimensional Loss Landscape Cross\-Sections for Different Loss Functions, all of50×5050\\times 50granularity\. ParametersDuD\_\{u\}andDvD\_\{v\}are fixed at ground truth values\.Despite these challenges, the pattern matrix in[Footnote5](https://arxiv.org/html/2606.11258#footnote5)reveals a potentially exploitable structure\. On theFF\-kkplane, it shows gradual and regular pattern changes on the pattern\-forming side of the cliffs, suggesting that this region is in principle traversable\. We can similarly observe such gradual changes on the low\-loss cliff sides in the cross\-sections of[Figure2](https://arxiv.org/html/2606.11258#S3.F2), although almost undiscernible in the cases ofFFandDuD\_\{u\}\. This traversability, however, is not accessible from the uniform side due to the sharp cliffs, besides being a weak and unstable signal even when accessible\.
We further observe, through evolution animations such as those linked in Appendix[AppendixD](https://arxiv.org/html/2606.11258#A4)and Appendix[AppendixE](https://arxiv.org/html/2606.11258#A5), that patterns get more similar to one another as they evolve in time, and that uniform steady\-states evolve from patterns in their earlier time steps\.666We verified this with much more diverse initial conditions, including fully random initialization, confirming it is not an artifact of similarly initialized fields; additional figures and animations are available in Appendix[C](https://arxiv.org/html/2606.11258#A3)\.Therefore, the regular changes like those in steady\-state pattern regions should exhibit themselves even more prominently at intermediate time steps prior to convergence—and, at those earlier steps, regular pattern change may be present on both sides of the cliff, including in regions where patterns ultimately converge to uniform solutions\. To what extent this can be exploited remains to be explored\.
### 5\.2Possible Remedies for the Landscape Pathologies
Continuing from the diagnosis so far, we briefly outline several possible directions to overcome the landscape issues, noting that the analyses in the following section will offer a different perspective\.\(a\) Better loss function design:a loss function could provide meaningful gradient signal within the pattern\-forming region while incorporating a mechanism to escape the uniform steady\-state plateau\. The regularity of pattern variations there suggests such a design is achievable\.\(b\) Time\-augmented surrogate model:a surrogate neural network trained to map\(Du,Dv,F,k,t\)→pattern\(D\_\{u\},D\_\{v\},F,k,t\)\\rightarrow pattern, with simulation timettas an extra input dimension, can enable the optimizer to explore earlier time steps where the parameter subspace exhibits more tractable loss geometry, and subsequently transfer progress toward improved final\-state predictions\.\(c\) Intermediate supervision:incorporating supervision at intermediate time steps may provide gradient signals unavailable from terminal steady\-state comparisons alone\. This requires careful design, as ground truth for intermediate states is not available in our setting\.
## 6Disentangling PINN
The above remedies all attempt to somehow reshape the problematic landscape\. A prior question is whether existing methods already avoid it—which reframes our diagnostic probe as an ablation of PINN\. Upon confirming that PINN does avoid the issue, this ablation leads us to disentangle which component is responsible, and in turn points to design implications for PINN\-type methods and a more general heuristic for navigating pathological parameter landscapes beyond PDEs\.

Figure 6:Residual loss cross\-section formed by varying parametersFFandkk, in50×5050\\times 50granularity\. ParametersDuD\_\{u\}andDvD\_\{v\}are fixed at ground truth values\.### 6\.1Existing Methods and the Case for PINN
Surveying the general literature on this type of inverse problem reveals limitations of non\-PINN approaches in our setting\.Schnörr and Schnörr \([2023](https://arxiv.org/html/2606.11258#bib.bib12)\)implemented a surrogate model that maps patterns to PDE parameters in a direction opposite to the mapping with an ill\-posed landscape, thereby avoiding landscape issues\. However, their model only aims at coarse parameter estimation and is trained to conflate different initial\-condition variations\. The issue of different parameter sets generating very similar patterns can pull their surrogate toward conflicting training instances, causing it to learn a compromised representation\.Najarroet al\.\([2026](https://arxiv.org/html/2606.11258#bib.bib13)\)address the related issue of noisy initial conditions—where the same parameters produce visually similar but pixel\-level\-different patterns—by using visual embedding distance as a loss function, a contribution consistent with our VGG\-based loss\. Their visual embedding loss shows discernibility across discrete parameter samples in the pattern\-forming region, echoing our observation in[Section5\.1](https://arxiv.org/html/2606.11258#S5.SS1)that pattern/loss changes do exist\. However, discernibility at discrete points does not imply traversability across the full continuous landscape, and their method relies entirely on evolutionary search, which is computationally expensive and does not exploit loss gradients\.
PINN naturally avoids both of these issues: it uses gradient\-based optimization directly, and, without a reverse\-mapping surrogate that conflates conflicting training instances, it searches on the original loss space and can converge to one of the multiple minima\. However, literature suggests that applying standard or improved PINN frameworks to our type of inverse problems is underexplored\. For example, attempts to apply PINNs to the Gray\-Scott model inGiampaoloet al\.\([2022](https://arxiv.org/html/2606.11258#bib.bib5)\)addressed only forward problems and did not seek inverse parameter estimation\. The degree of success reported inZhenget al\.\([2024](https://arxiv.org/html/2606.11258#bib.bib7)\)covered only one specific set of Gray\-Scott parameters at substantial computational cost\. Most importantly, they did not provide a theory about how landscape issues were resolved\. We therefore examine more carefully why PINN’s advantages over non\-PINN methods translate or fail to translate to our setting\.
### 6\.2The Residual Loss Yields a Well\-Behaved Landscape
We first adapt the PINN formulation to our setting and set up the joint\-space geometry, then show that the residual loss alone yields a well\-behaved landscape\.
In the first PINN paper\(Raissiet al\.,[2019](https://arxiv.org/html/2606.11258#bib.bib4)\), a neural network is trained with fixed PDE constraints to fit the data mapping\(x,y,t\)→\(ux,y,vx,y\)\(x,y,t\)\\rightarrow\(u\_\{x,y\},v\_\{x,y\}\), which is equivalent to\(t\)→\(U,V\)\(t\)\\rightarrow\(U,V\), whereUUandVVrepresent full images at time step\(t\)\(t\)\. This equivalence holds because at any time step, the pair of full images equals a mapping from index pairs\(x,y\)\(x,y\)to pixel value pairs\(ux,y,vx,y\)\(u\_\{x,y\},v\_\{x,y\}\)\. The same PINN paper\(Raissiet al\.,[2019](https://arxiv.org/html/2606.11258#bib.bib4)\)then made the PDE parameters learnable, training them together with the neural network parameters, while keeping the data mapping the same\. To adapt it to our problem, we need to set alltt’s to a large constant \(to use steady\-state patterns without time labels\) and evaluate the residual loss as the squared residual of the elliptic form\.
With this background, we consider the joint parameter space\(θ,μ\)∈ℝn×ℝm\(\\theta,\\mu\)\\in\\mathbb\{R\}^\{n\}\\times\\mathbb\{R\}^\{m\}, whereθ\\thetadenotes neural network parameters andμ\\mudenotes PDE parameters, to analyze its geometry\. It is trivial that these two subspaces are orthogonal: for anyv∈ℝnv\\in\\mathbb\{R\}^\{n\}andw∈ℝmw\\in\\mathbb\{R\}^\{m\},\(v,0\)⟂\(0,w\)\(v,0\)\\perp\(0,w\)\. The total loss decomposes as
L\(θ,μ\)=Ldata\(θ\)\+Lres\(θ,μ\),L\(\\theta,\\mu\)=L\_\{\\text\{data\}\}\(\\theta\)\+L\_\{\\text\{res\}\}\(\\theta,\\mu\),\(7\)where the data term depends onθ\\thetaalone\.
To make comparisons with the losses plotted in[Section4](https://arxiv.org/html/2606.11258#S4), we study the loss restricted to the PDE parameter subspace, and, sinceLdataL\_\{\\text\{data\}\}is independent ofμ\\mu, we only considerLres\(θ,⋅\)L\_\{\\text\{res\}\}\(\\theta,\\cdot\), residual loss for theμ\\mu\-subspace at fixedθ\\theta\. During the training of a PINN, a fixedθ\\thetaproduces an output somewhere between random initialization and the target, depending on how well the network has been fit at that moment\. Because, theoretically,Lres\(θ,⋅\)L\_\{\\text\{res\}\}\(\\theta,\\cdot\)compares this fixed outcome of neural network with the solution of the PDE, one may expect to see the same ill\-posed landscape\. However, direct calculation shows that, onceθ\\thetais fixed, the network outputsuu,vv\(and henceΔu\\Delta u,Δv\\Delta v, anduv2uv^\{2\}\) are all fixed, and the residual of the elliptic Gray\-Scott equation is linear in parametersμ\\mu\. Therefore, the residual loss is a quadratic function of the parameters, yielding a smooth, bowl\-shaped landscape\. Our empirical cross\-section in[Figure6](https://arxiv.org/html/2606.11258#S6.F6)confirms this bowl shape\.
This is an interesting counter\-intuition: whatever the three directions in[Section5\.2](https://arxiv.org/html/2606.11258#S5.SS2)endeavor to achieve, residual loss has already achieved neatly, with no contribution from the neural network\. Looking at why, we find that, although residual loss compares the target with what the PDE actually produces, its comparison implicitly includes the full evolution dynamics, considering all initial conditions at once, rather than comparing the final pattern of a given IC\. This allowed residual loss to access much more abundant information, including the intermediate steps we sought to utilize across the directions in[Section5\.2](https://arxiv.org/html/2606.11258#S5.SS2)\.
### 6\.3A Neural Network Cannot Repair an Ill\-Posed Landscape
A converse question is whether a neural network canrepaira landscape that is already ill\-posed: suppose a hypothetical system whose parameter space \(denotedμ~\\tilde\{\\mu\}\) is itself ill\-posed under a residual or alternative lossL~res\\tilde\{L\}\_\{\\text\{res\}\}—does adding the PINN\-style auxiliary \(a neural networkθ\\thetaand a data loss\) help? The short answer is no\.
In theθ\\theta\-subspace \(for any fixedμ~\\tilde\{\\mu\}\), both the data loss and residual loss landscapes are typically amenable to gradient\-based optimization, as is common when fitting a neural network to fixed data\(Sitzmannet al\.,[2020](https://arxiv.org/html/2606.11258#bib.bib14)\)\. AlthoughKrishnapriyanet al\.\([2021](https://arxiv.org/html/2606.11258#bib.bib6)\)characterize PINN failure modes in a setting different from ours—the forward problem of recovering the full space\-time solution at large PDE coefficients—their analysis bears on the present point as theθ\\theta\-subspace we consider coincides with the network\-parameter landscape they study\. Their findings in fact support our assessment: their sequence\-to\-sequence remedy shows that fitting only a short slice of time with residual loss is tractable, and our steady state corresponds to a single such slice\.
Yet a tractableθ\\theta\-subspace cannot rescue an ill\-posedμ~\\tilde\{\\mu\}\-subspace, as the gradient of the total lossL\(θ,μ~\)=Ldata\(θ\)\+L~res\(θ,μ~\)L\(\\theta,\\tilde\{\\mu\}\)=L\_\{\\text\{data\}\}\(\\theta\)\+\\tilde\{L\}\_\{\\text\{res\}\}\(\\theta,\\tilde\{\\mu\}\)reveals\. This gradient splits into three components:
∇L=\(∇θLdata\+∇θL~res∇μ~L~res\),\\nabla L=\\begin\{pmatrix\}\\nabla\_\{\\theta\}L\_\{\\text\{data\}\}\+\\nabla\_\{\\theta\}\\tilde\{L\}\_\{\\text\{res\}\}\\\\ \\nabla\_\{\\tilde\{\\mu\}\}\\tilde\{L\}\_\{\\text\{res\}\}\\end\{pmatrix\},\(8\)and two reasons lead to this conclusion\.
First, movement within theθ\\theta\-subspace at one particular point in theμ~\\tilde\{\\mu\}\-subspace does not transfer well to neighboring points in theμ~\\tilde\{\\mu\}\-subspace\. This failure does not stem from∇θLdata\\nabla\_\{\\theta\}L\_\{\\text\{data\}\}—LdataL\_\{\\text\{data\}\}is independent ofμ~\\tilde\{\\mu\}, and itsθ\\theta\-landscape remains identical at different points in theμ~\\tilde\{\\mu\}\-subspace\. The failure comes from∇θL~res\\nabla\_\{\\theta\}\\tilde\{L\}\_\{\\text\{res\}\}, where eachμ~\\tilde\{\\mu\}defines a different target pattern for the neural network, and consequently a different loss landscapeL~res\(⋅,μ~\)\\tilde\{L\}\_\{\\text\{res\}\}\(\\cdot,\\tilde\{\\mu\}\)\. If this target pattern jumps at discontinuities, slices of loss landscapeL~res\(⋅,μ~\)\\tilde\{L\}\_\{\\text\{res\}\}\(\\cdot,\\tilde\{\\mu\}\)can change discontinuously due to jumps alongμ~\\tilde\{\\mu\}\.
Second, no matter how the optimizer moves in theθ\\theta\-subspace, the harshness of landscapesL~res\(θ,⋅\)\\tilde\{L\}\_\{\\text\{res\}\}\(\\theta,\\cdot\)persists\.L~res\(θ,⋅\)\\tilde\{L\}\_\{\\text\{res\}\}\(\\theta,\\cdot\)at anyθ\\thetainherits the difficulties ofμ~\\tilde\{\\mu\}\-subspace, making∇μ~L~res\\nabla\_\{\\tilde\{\\mu\}\}\\tilde\{L\}\_\{\\text\{res\}\}non\-informative\.
Thus, while PINN increases the dimensionality of the search space, this extra freedom does not provide an effective detour when we have a problematic landscape structure\. The well\-behavedθ\\theta\-subspace noted above is, moreover, the most favorable case for PINN: were either of these landscapes less well\-behaved, this conclusion would only be reinforced\. Whether the residual loss landscape overθ\\thetaremains well\-behaved beyond our single\-frame setting—in particular in the full space\-time problem, where the evidence ofSitzmannet al\.\([2020](https://arxiv.org/html/2606.11258#bib.bib14)\)andKrishnapriyanet al\.\([2021](https://arxiv.org/html/2606.11258#bib.bib6)\)appears to conflict—is left to future work\.
### 6\.4Roles of Components and Design Implications
We now have the conclusions that residual loss alone has prevented the landscape issue, and that the neural network does not contribute more to this endeavor\. It follows directly that the neural network serves only to complete the data\.
Based on this theory of what each component of PINN does, we can optimize and reduce redundancy of PINN’s neural network component, use the data scheme \(what is observed and how much is observed\) of the specific application to inform the design of the neural network, and adjust the combination of neural network and residual loss according to the training requirements of the specific application\. We have already developed detailed designs for these directions but intend to disclose them in future publications rather than in this work\.
The finding that a neural network cannot improve an ill\-posed PDE parameter landscape further suggests a general design heuristic beyond the PDE setting: when loss landscapes are pathological in a given parameter subspace, any auxiliary dimensions introduced should provide navigable detours around the pathological structure, rather than leaving it insurmountable in the expanded space\. Thett\-enhanced parameter space in direction \(b\) of[Section5\.2](https://arxiv.org/html/2606.11258#S5.SS2)is one example of applying this principle\.
## 7Conclusion and Future Work
Serving as a radical ablation of the full PINN framework, we directly applied gradient\-based optimization to recover Gray\-Scott system parameters from IC\-dependent steady\-states without any auxiliary structure\. Empirical analysis revealed severe pathologies in the resulting loss landscapes—sharp cliffs near bifurcation boundaries and flat plateaus in uniform\-solution regions—that systematically prevent convergence\.
Through principled theoretical analysis, further supported by experimental verification of the residual loss landscape, we then disentangled the roles of PINN’s components: the residual loss avoids these pathologies by implicitly encoding full PDE dynamics, rather than single\-trajectory steady\-state information, yielding a smooth quadratic landscape in PDE parameter space; while the neural network, unable to improve this landscape, serves instead to complete partial observational data—a distinction not previously made explicit\.
The next steps of our work follow the design implications in[Section6\.4](https://arxiv.org/html/2606.11258#S6.SS4): testing neural network architectures tailored to the data completion role identified here, optimized for different observational schemes across diverse PDE systems and other physical domains\. We have already completed a redesign for the Gray\-Scott inverse problem discussed in this paper and intend to disclose the details in future publications\.
Two further directions concern the analysis itself\. First, the good behavior of the residual loss landscape overθ\\theta—argued here by analogy toSitzmannet al\.\([2020](https://arxiv.org/html/2606.11258#bib.bib14)\)andKrishnapriyanet al\.\([2021](https://arxiv.org/html/2606.11258#bib.bib6)\)—warrants direct empirical verification, through landscape visualization, in our own steady\-state Gray\-Scott setting\. Second, we aim to resolve the open question raised in[Section6\.3](https://arxiv.org/html/2606.11258#S6.SS3): whether this landscape remains tractable in the full space\-time setting, and what underlies the conflicting evidence noted there, through theoretical analysis and further visualization\.
## Impact Statement
This paper presents work whose goal is to advance the field of Machine Learning\. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here\.
## Acknowledgements
The author thanks the reviewers and editors for their feedback, which prompted further refinement of the analysis presented in this work\. The author also thanks several fellow participants of the IBRO/NYUAD Autumn School on fMRI, whose encouragement to document prior research work set in motion a chain of exploration that unexpectedly led to the findings reported here\.
## Software and Data
## References
- J\. Delgado, L\. I\. Hernández\-Martínez, and J\. Pérez\-López \(2017\)Global bifurcation map of the homogeneous states in the Gray–Scott model\.International Journal of Bifurcation and Chaos27\(07\),pp\. 1730024\.External Links:[Document](https://dx.doi.org/10.1142/S0218127417300245),https://doi\.org/10\.1142/S0218127417300245,[Link](https://doi.org/10.1142/S0218127417300245)Cited by:[§2\.1](https://arxiv.org/html/2606.11258#S2.SS1.p1.2),[§5\.1](https://arxiv.org/html/2606.11258#S5.SS1.p1.1)\.
- D\. L\. Gandy and M\. R\. Nelson \(2022\)Analyzing pattern formation in the Gray–Scott model: an XPPAUT tutorial\.SIAM Review64\(3\),pp\. 728–747\.External Links:[Document](https://dx.doi.org/10.1137/21M1402868),https://doi\.org/10\.1137/21M1402868,[Link](https://doi.org/10.1137/21M1402868)Cited by:[§2\.1](https://arxiv.org/html/2606.11258#S2.SS1.p1.2),[§5\.1](https://arxiv.org/html/2606.11258#S5.SS1.p1.1)\.
- L\. A\. Gatys, A\. S\. Ecker, and M\. Bethge \(2016\)Image style transfer using convolutional neural networks\.InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition \(CVPR\),Cited by:[§4\.2](https://arxiv.org/html/2606.11258#S4.SS2.p2.1)\.
- F\. Giampaolo, M\. De Rosa, P\. Qi, S\. Izzo, and S\. Cuomo \(2022\)Physics\-informed neural networks approach for 1D and 2D Gray\-Scott systems\.Advanced Modeling and Simulation in Engineering Sciences9\(1\),pp\. 5\.External Links:[Document](https://dx.doi.org/10.1186/s40323-022-00219-7),ISBN 2213\-7467,[Link](https://doi.org/10.1186/s40323-022-00219-7)Cited by:[§6\.1](https://arxiv.org/html/2606.11258#S6.SS1.p2.1)\.
- S\. Kondo \(2022\)The present and future of Turing models in developmental biology\.\.Development149\(24\), \(MEDLINE\) \(eng\)\.External Links:[Document](https://dx.doi.org/10.1242/dev.200974),ISSN 1477\-9129 \(Electronic\); 0950\-1991 \(Linking\),PII 286110Cited by:[§1](https://arxiv.org/html/2606.11258#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.11258#S2.SS1.p1.6)\.
- A\. Krishnapriyan, A\. Gholami, S\. Zhe, R\. Kirby, and M\. W\. Mahoney \(2021\)Characterizing possible failure modes in physics\-informed neural networks\.InAdvances in Neural Information Processing Systems,M\. Ranzato, A\. Beygelzimer, Y\. Dauphin, P\.S\. Liang, and J\. W\. Vaughan \(Eds\.\),Vol\.34,pp\. 26548–26560\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2021/file/df438e5206f31600e6ae4af72f2725f1-Paper.pdf)Cited by:[§6\.3](https://arxiv.org/html/2606.11258#S6.SS3.p2.3),[§6\.3](https://arxiv.org/html/2606.11258#S6.SS3.p6.2),[§7](https://arxiv.org/html/2606.11258#S7.p4.1)\.
- J\. Lefèvre and J\. Mangin \(2010\)A reaction\-diffusion model of human brain development\.PLOS Computational Biology6\(4\),pp\. 1–10\.External Links:[Document](https://dx.doi.org/10.1371/journal.pcbi.1000749),[Link](https://doi.org/10.1371/journal.pcbi.1000749)Cited by:[§1](https://arxiv.org/html/2606.11258#S1.p1.1)\.
- E\. Najarro, N\. Bessone, and S\. Risi \(2026\)Solving inverse problems in stochastic self\-organizing systems through invariant representations\.External Links:2506\.11796,[Link](https://arxiv.org/abs/2506.11796)Cited by:[§6\.1](https://arxiv.org/html/2606.11258#S6.SS1.p1.1)\.
- M\. Raissi, P\. Perdikaris, and G\.E\. Karniadakis \(2019\)Physics\-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations\.Journal of Computational Physics378,pp\. 686–707\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.jcp.2018.10.045),ISSN 0021\-9991,[Link](https://www.sciencedirect.com/science/article/pii/S0021999118307125)Cited by:[§1](https://arxiv.org/html/2606.11258#S1.p2.1),[§6\.2](https://arxiv.org/html/2606.11258#S6.SS2.p2.8)\.
- D\. Schnörr and C\. Schnörr \(2023\)Learning system parameters from Turing patterns\.Machine Learning112\(9\),pp\. 3151–3190\.External Links:[Document](https://dx.doi.org/10.1007/s10994-023-06334-9),ISBN 1573\-0565,[Link](https://doi.org/10.1007/s10994-023-06334-9)Cited by:[§1](https://arxiv.org/html/2606.11258#S1.p2.1),[§6\.1](https://arxiv.org/html/2606.11258#S6.SS1.p1.1)\.
- K\. Simonyan and A\. Zisserman \(2015\)Very deep convolutional networks for large\-scale image recognition\.InInternational Conference on Learning Representations,Cited by:[§4\.2](https://arxiv.org/html/2606.11258#S4.SS2.p2.1)\.
- V\. Sitzmann, J\. N\.P\. Martel, A\. W\. Bergman, D\. B\. Lindell, and G\. Wetzstein \(2020\)Implicit neural representations with periodic activation functions\.InConference on Neural Information Processing Systems \(NeurIPS\),Cited by:[§6\.3](https://arxiv.org/html/2606.11258#S6.SS3.p2.3),[§6\.3](https://arxiv.org/html/2606.11258#S6.SS3.p6.2),[§7](https://arxiv.org/html/2606.11258#S7.p4.1)\.
- H\. Zheng, Y\. Huang, Z\. Huang, W\. Hao, and G\. Lin \(2024\)HomPINNs: homotopy physics\-informed neural networks for solving the inverse problems of nonlinear differential equations with multiple solutions\.Journal of Computational Physics500,pp\. 112751\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.jcp.2023.112751),ISSN 0021\-9991,[Link](https://www.sciencedirect.com/science/article/pii/S0021999123008471)Cited by:[§6\.1](https://arxiv.org/html/2606.11258#S6.SS1.p2.1)\.
## Appendix AAppendix: Boundary and initial conditions
The stepping algorithm uses the periodic boundary condition\. Initial conditions are generated randomly for each training step, following the same controlled pattern:
ui,j=\{0\.50,if54≤i,j≤74clamp\(1\.0\+𝒩\(0,1\),\[0\.0,1\.5\]\),otherwiseu\_\{i,j\}=\\begin\{cases\}0\.50,&\\text\{if \}54\\leq i,j\\leq 74\\\\ \\text\{clamp\}\(1\.0\+\\mathcal\{N\}\(0,1\),\\,\[0\.0,1\.5\]\),&\\text\{otherwise\}\\end\{cases\}\(9\)vi,j=\{0\.25,if54≤i,j≤74clamp\(𝒩\(0,1\),\[0\.0,1\.0\]\),otherwisev\_\{i,j\}=\\begin\{cases\}0\.25,&\\text\{if \}54\\leq i,j\\leq 74\\\\ \\text\{clamp\}\(\\mathcal\{N\}\(0,1\),\\,\[0\.0,1\.0\]\),&\\text\{otherwise\}\\end\{cases\}\(10\)\.
## Appendix BAppendix: More Loss Plots for theFF\-kkCross\-Sections
For all plots here, parametersDuD\_\{u\}andDvD\_\{v\}are fixed at ground truth values\.
Figure 7:Non\-windowed loss landscape on a larger region ofFF\-kk,30×3030\\times 30\. This shows that the lower plateau is surrounded by cliffs from all sides\.\(a\)Opposite view angle,30×3030\\times 30granularity\.
\(b\)Top\-down view \(2D plot\),30×3030\\times 30granularity\.
Figure 8:ExtraFF\-kkcross\-section plots for the VGG\-based Gram matrix loss landscape\.
## Appendix CAppendix: Intermediate vs Steady\-State Patterns Under Varied Initial Conditions



Figure 9:Intermediate \(top two\) and steady\-state \(bottom\) patterns for large \(6×6\\timesthe original\) random noise\.


Figure 10:Intermediate \(top two\) and steady\-state \(bottom\) patterns for random noise with a random\-sized perturbation box at a random location\.
## Appendix DAppendix: The One Training Trajectory Manually Picked
Here we show a trajectory that the optimizer followed but did not automatically pivot or cut off\. The pivoting and cut\-off points are identified by manually animating pattern generation at random checkpoints and comparing them with targets\. The trajectory record:
- •Initial parametersDu=0\.1270D\_\{u\}=0\.1270,Dv=0\.1269D\_\{v\}=0\.1269,F=0\.0500F=0\.0500,k=0\.0501k=0\.0501\.
- •Initial training: used L2 loss on 2d power spectrum, learning rate1\.2e−21\.2e\-2, and trained for89188918iterations, arrived atDu=0\.1285D\_\{u\}=0\.1285,Dv=0\.0734D\_\{v\}=0\.0734,F=0\.0429F=0\.0429,k=0\.0682k=0\.0682\.
- •Continued training using L2 loss on 2d power spectrum and learning rate1e−31e\-3, for9898iterations, arrived atDu=0\.1343D\_\{u\}=0\.1343,Dv=0\.0679D\_\{v\}=0\.0679,F=0\.0429F=0\.0429,k=0\.0653k=0\.0653\.
We also created a gif animation file, showing simultaneous animations of four pattern evolutions, under: 1\) initial parameters, 2\) parameters after the initial89188918iterations, 3\) parameters after the extra9898iterations, and 4\) target parameters\. The four animations are arranged from left to right in the file:[https://osf\.io/xfk7z/files/348kw?view\_only=e26ec2a6706c40a4a0606691f9449900](https://osf.io/xfk7z/files/348kw?view_only=e26ec2a6706c40a4a0606691f9449900)\.
## Appendix EAppendix: Links to Animation Files
Each file shows 12 simultaneous animations, corresponding to 12 sample values of the investigated parameter, with the other 3 parameters set to their ground truths\.
- •
- •
- •
- •Similar Articles
Finite Volume-Informed Neural Network Framework for 2D Shallow Water Equations: Rugged Loss Landscapes and the Importance of Data Guidance
This paper introduces 'Data-Guided FVM-PINN', a framework using finite-volume losses for 2D shallow water equations, demonstrating that sparse data guidance is crucial to prevent network collapse in rugged loss landscapes.
State-Space NTK Collapse Near Bifurcations
This paper develops a local theory of gradient descent near bifurcations in dynamical models, showing that the state-space neural tangent kernel collapses to a rank-one operator that dominates learning dynamics, making optimization effectively low-dimensional and predictable from normal forms.
Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway
This paper shows that discrete Gradient Descent with large step sizes restores symmetry in multi-pathway Deep Linear Networks, countering the symmetry-breaking predicted by Gradient Flow, and leads to signal re-balancing across pathways. The authors theoretically prove that balanced solutions are flatter (less sharp) than sparse ones, and large learning rates drive the network toward stable, balanced configurations.
Spectral Asymptotics of Neural Network Loss Landscapes: An Exact Decomposition of the Curvature Exponent
This paper presents an exact decomposition of the curvature exponent α in neural network loss landscapes, explaining why it varies across layer types. It introduces the spectral alignment decomposition and derives a spectral transfer identity linking curvature, gradient rank decay, and Hessian exponents, validated across architectures and datasets.
When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges
This paper identifies two failure modes in multi-objective prompt optimization for LLM judges using textual gradients: gradient dilution during optimization and instruction interference during inference, showing that joint gradient processing loses criterion-specific information.