Architecture Shapes Transfer Specificity in Implicit Neural Representations
Summary
This paper studies transfer specificity in implicit neural representations across SIREN, ReLU MLPs, and Fourier-feature MLPs, finding that transfer magnitude and specificity depend on architecture, with ReLU being more selective and SIREN reusing weights broadly. Results suggest architecture selection should consider explicit control conditions, not just transfer magnitude.
View Cached Full Text
Cached at: 06/08/26, 09:18 AM
# Architecture Shapes Transfer Specificity in Implicit Neural Representations
Source: [https://arxiv.org/html/2606.06827](https://arxiv.org/html/2606.06827)
###### Abstract
Transfer in coordinate networks is often measured by warm\-start gain, but whether that gain reflects source\-specific structure or generic weight reuse is less clear\. We study this question across three implicit neural representation \(INR\) families, SIREN, ReLU MLPs, and Fourier\-feature MLPs, using controlled analytic tests, a 2D lid\-driven\-cavity Navier–Stokes benchmark, and 1D PDE reference\-solution suites for heat, viscous Burgers, and focusing cubic NLS\. The analytic tests use independent\-seed random controls, while the PDE benchmarks use alternate same\-family source controls and auxiliary ablations\.
Across settings, transfer magnitude and transfer specificity separate clearly\. In a 10\-seed controlled 1D geometric test, Fourier Features show the largest structured transfer \(33\.1×33\.1\\times\), followed by SIREN \(23\.0×23\.0\\times\) and ReLU \(10\.7×10\.7\\times\), but ReLU is far more selective: random\-control transfer is0\.41×0\.41\\timesfor ReLU versus14\.24×14\.24\\timesfor SIREN\. On a controlled two\-parameter 1D family, the ranking changes: ReLU gives the clearest structured\-versus\-control separation at default settings, whereas Fourier Features improve only after bandwidth retuning\. In Navier–Stokes and the broader 1D PDE suite, no single architecture dominates every equation, yet the same pattern remains: SIREN often reuses weights broadly, whereas ReLU and, in some equations, Fourier Features are more source\-selective\. Static diagnostics remain weak, and the heuristic scaling lawAtransfer∝1/Δt2A\_\{\\text\{transfer\}\}\\propto 1/\\Delta t^\{2\}is rejected in the implemented 1D audit\.
These results position transfer specificity as a useful diagnostic for coordinate networks and suggest that architecture selection in scientific machine learning should be evaluated under explicit control conditions, not by transfer magnitude alone\.
Keywords:implicit neural representations; transfer learning; coordinate networks; scientific machine learning; partial differential equations; spectral bias
## 1Introduction
Coordinate networks, often called implicit neural representations \(INRs\), encode continuous signals as neural functions of coordinates\. In scientific machine learning, they are attractive as mesh\-independent surrogates: once trained, they can be evaluated at arbitrary coordinates, differentiated with respect to their inputs, and reused across steady or time\-dependent fields\. More broadly, they sit within a scientific\-machine\-learning push toward reusable surrogate solvers, reduced models, and operator approximations for PDE\-governed systems\(Brunton and Kutz,[2024](https://arxiv.org/html/2606.06827#bib.bib6)\)\. At the same time, architecture choices induce different spectral biases, frequency preferences, and optimization dynamics, from sinusoidal networks to Fourier\-feature MLPs and plain ReLU coordinate networks\(Essakineet al\.,[2025](https://arxiv.org/html/2606.06827#bib.bib28); Sitzmannet al\.,[2020](https://arxiv.org/html/2606.06827#bib.bib29); Tanciket al\.,[2020](https://arxiv.org/html/2606.06827#bib.bib33); Mülleret al\.,[2022](https://arxiv.org/html/2606.06827#bib.bib34)\)\.
Most comparisons between INR architectures focus on reconstruction error or downstream surrogate accuracy\. For parametric workflows, however, the more revealing question may be transfer: if a coordinate network has already been trained at one parameter value, when does fine\-tuning help at a nearby parameter, and when is that gain specific to the target family rather than generic weight reuse? Framed this way, the problem is also a transfer\-learning question about what pretrained neural weights actually carry across related tasks\(Pan and Yang,[2010](https://arxiv.org/html/2606.06827#bib.bib5)\)\. That question matters in reduced\-order and amortized\-solver settings, where the point of pretraining is to reuse a learned representation across many related targets rather than fit each instance from scratch\.
Here we study transfer behavior itself as the object of interest\. We compare SIREN, ReLU MLPs, and Fourier\-feature INRs across controlled analytic tests, a 2D Navier–Stokes lid\-driven cavity benchmark, and 1D PDE reference\-solution suites for heat, Burgers, and focusing cubic NLS\. The analytic tests use independent\-seed random controls to separate structured transfer from incidental reuse, while the PDE benchmarks use alternate same\-family source controls and auxiliary ablations to test whether the same architecture patterns survive in more realistic settings\.
Our objective is diagnostic rather than architectural\. We do*not*propose a new conditional solver, neural operator, or physics\-informed training scheme\. Instead, we use transfer magnitude and transfer specificity as complementary probes of representational continuity under a fixed optimization protocol\. The central question is:how does INR architecture control both the amount of transfer and the source\-specificity of that transfer across analytic and PDE families?The intended output is practical guidance for choosing coordinate\-network architectures when transfer, not only single\-task accuracy, is the main concern\.
### 1\.1Key Findings and Contributions
Our investigation makes four main contributions:
1. 1\.We present a unified transfer study across controlled analytic tests and PDE benchmarks\.The same three INR families are evaluated under explicit control conditions on a 1D geometric family, a controlled two\-parameter 1D family, a Navier–Stokes lid\-driven cavity benchmark, and 1D PDE reference\-solution suites for heat, Burgers, and NLS\.
2. 2\.Architecture controls transfer specificity as much as transfer magnitude\.In the controlled 1D geometric test, Fourier Features have the largest structured transfer, but ReLU is much more selective under the independent\-seed random control, whereas SIREN transfers strongly even on random targets\.
3. 3\.The architecture ranking is not universal across task families\.ReLU is the clearest discriminator on the controlled two\-parameter family and on Navier–Stokes, whereas the heat/Burgers/NLS suite shows that absolute transfer rankings vary by equation even when the distinction between magnitude and specificity persists\.
4. 4\.Explicit null models matter, whereas simple static diagnostics remain weak\.Independent random controls, alternate\-source controls, and shuffled\-weight controls materially change the interpretation of transfer gains\. By contrast, participation ratio, Hessian sharpness, and independent\-seed CKA do not reliably separate structured reuse from weakly specific reuse, and the heuristic scaling lawA∝1/Δt2A\\propto 1/\\Delta t^\{2\}is rejected in the implemented 1D audit\.
## 2Related Work
### 2\.1Classical and Neural Parametric PDE Surrogates
Reduced\-order modeling and classical surrogates\.Projection\-based ROM, POD/Galerkin methods, and reduced\-basis approximations are standard tools for repeated parametric PDE solves, especially when many queries are needed after an expensive offline stage\(Berkoozet al\.,[1993](https://arxiv.org/html/2606.06827#bib.bib38); Benneret al\.,[2015](https://arxiv.org/html/2606.06827#bib.bib39); Quarteroniet al\.,[2015](https://arxiv.org/html/2606.06827#bib.bib40)\)\. Their offline\-online structure is conceptually close to the transfer question studied here: one invests in a reusable representation and hopes to reduce the cost of later parameter evaluations\. Our INR setting differs in the representation class\. Instead of projecting onto a linear reduced basis or a hand\-built polynomial basis, we fit coordinate networks to reference fields and ask whether their learned weights provide useful, architecture\-dependent warm starts across parameters\.
Parameter\-conditioned PINNs\.Parameterized and parameter\-conditioned PINNs aim to learn continuous PDE solution families by treating physical parameters as network inputs or latent variables\. Recent P2INN\-style architectures introduce modular or latent\-encoded parameter representations that improve accuracy and efficiency on parameterized PDEs\(Choet al\.,[2024](https://arxiv.org/html/2606.06827#bib.bib1); Zhanget al\.,[2025](https://arxiv.org/html/2606.06827#bib.bib3)\)\. Related Navier–Stokes and RANS studies demonstrate both the promise and limitations of parameter\-conditioned training for fluid systems\(Jangiret al\.,[2026](https://arxiv.org/html/2606.06827#bib.bib2); Ghoshet al\.,[2023](https://arxiv.org/html/2606.06827#bib.bib7)\), while GPT\-PINN uses a meta\-learning strategy that composes pretrained basis PINNs for continuous parameter transfer\(Chen and Koohy,[2024](https://arxiv.org/html/2606.06827#bib.bib4)\)\. These methods engineer explicit parameter embeddings to encourage continuity; in contrast, we keep the INR architectures minimal and use transfer behavior across parameters as a probe of how much continuity arises without conditioning\.
Geometry and domain transfer in PINNs\.Recent PINN transfer studies include modular fine\-tuning schemes with boundary\-aware pretraining, lightweight geometry\-specific correction layers, and low\-rank adaptation across varying boundary conditions, geometries, and material distributions\(Liet al\.,[2025](https://arxiv.org/html/2606.06827#bib.bib8); Roy,[2025](https://arxiv.org/html/2606.06827#bib.bib9); Wanget al\.,[2025](https://arxiv.org/html/2606.06827#bib.bib10)\)\. Other work encodes irregular geometries into latent variables for physics\-informed surrogates or studies transfer through the PINN loss landscape\(Oldenburget al\.,[2022](https://arxiv.org/html/2606.06827#bib.bib11); Liuet al\.,[2023](https://arxiv.org/html/2606.06827#bib.bib12)\)\. These studies quantify when transfer is useful across geometries or materials, but they do not compare simple INR coordinate networks across architectures, nor do they use independent\-seed random controls to separate genuine target\-family continuity from incidental weight reuse\.
DeepONet and Fourier Neural Operators\.Operator\-learning architectures such as neural operators, DeepONet, and Fourier Neural Operators aim to learn solution operators over whole PDE families rather than fine\-tune separate coordinate networks\(Kovachkiet al\.,[2023](https://arxiv.org/html/2606.06827#bib.bib16); Luet al\.,[2021](https://arxiv.org/html/2606.06827#bib.bib14); Liet al\.,[2021](https://arxiv.org/html/2606.06827#bib.bib19)\)\. Physics\-informed DeepONets add PDE residual penalties to learn parametric solution operators without paired input\-output data\(Wanget al\.,[2021](https://arxiv.org/html/2606.06827#bib.bib13)\)\. FNO variants, including learned\-deformation FNOs and factorized FNOs, parameterize solution maps through Fourier\-domain kernels and have shown resolution\-independent performance on PDE benchmarks\(Liet al\.,[2023](https://arxiv.org/html/2606.06827#bib.bib15); Tranet al\.,[2021](https://arxiv.org/html/2606.06827#bib.bib20)\)\. More recent operator\-learning extensions include physics\-informed transformer neural operators for generalized initial/boundary\-condition transfer and geometry\-aware transformer operators for variable domains\(Boya and Subramani,[2024](https://arxiv.org/html/2606.06827#bib.bib17); Chenet al\.,[2026](https://arxiv.org/html/2606.06827#bib.bib18)\)\. These methods provide powerful alternatives for parametric PDEs; our focus is complementary, asking how much structure is already encoded in simple INR families under naive source\-to\-target fine\-tuning\.
### 2\.2INR Architectures and Multi\-Scale Representation
SIREN, spectral bias, and initialization\.SIREN introduced periodic activations for INRs and demonstrated that sinusoidal networks are well suited for representing complex signals and their derivatives\(Sitzmannet al\.,[2020](https://arxiv.org/html/2606.06827#bib.bib29)\)\. Subsequent work studies how the spectral support and initialization of periodic networks affect optimization\. For example, WINNER perturbs uniformly initialized weights using noise scaled by the target signal’s spectral centroid, while FINER controls spectral bias through variable\-periodic activation functions\(Chandravamsiet al\.,[2025](https://arxiv.org/html/2606.06827#bib.bib31); Liuet al\.,[2024](https://arxiv.org/html/2606.06827#bib.bib32)\)\. These results motivate the possibility that SIREN\-like networks may reuse weights strongly when their frequency support is aligned with a target family, but may also transfer broadly when that reuse is not specific\. The dominance of the architectural prior is not specific to transfer: in a different scientific\-machine\-learning domain, the choice of geometric prior rather than the coordinate network alone governs accuracy when SIREN\-style coordinate networks are used to approximate Calabi–Yau metrics on the quintic\(Eng,[2026](https://arxiv.org/html/2606.06827#bib.bib30)\), complementing our focus on how architecture shapes transfer rather than single\-task accuracy\.
Fourier features and NTK bandwidth\.Fourier feature mappings transform the effective neural tangent kernel of an MLP into a stationary kernel with tunable bandwidth, enabling MLPs to fit high\-frequency functions that standard ReLU networks learn poorly\(Tanciket al\.,[2020](https://arxiv.org/html/2606.06827#bib.bib33)\)\. This provides a theoretical lens for interpreting architecture\-dependent transfer: if reuse depends on spectral alignment, Fourier\-feature MLPs may transfer strongly when the feature bandwidth matches the target family, but may lose magnitude or specificity when the bandwidth is mismatched\.
Multi\-resolution and conditional INRs\.Instant\-NGP and related multi\-resolution encodings augment coordinate networks with trainable feature grids or hash tables, giving fast optimization and local detail through coarse\-to\-fine spatial features\(Mülleret al\.,[2022](https://arxiv.org/html/2606.06827#bib.bib34); Wanget al\.,[2024](https://arxiv.org/html/2606.06827#bib.bib35); Luo,[2025](https://arxiv.org/html/2606.06827#bib.bib36)\)\. In fluid and PDE settings, Neural Implicit Flow, DINo, conditional neural fields, and related physics\-enhanced INRs use hypernetworks, latent dynamics, FiLM\-style conditioning, probabilistic residual objectives, or transformer\-enhanced coordinate encodings to improve parametric generalization and uncertainty quantification\(Panet al\.,[2023](https://arxiv.org/html/2606.06827#bib.bib21); Yinet al\.,[2023](https://arxiv.org/html/2606.06827#bib.bib22); Kimet al\.,[2024](https://arxiv.org/html/2606.06827#bib.bib23); Najian Aslet al\.,[2026](https://arxiv.org/html/2606.06827#bib.bib25); Chatzopoulos and Koutsourelakis,[2024](https://arxiv.org/html/2606.06827#bib.bib26); Shenet al\.,[2025](https://arxiv.org/html/2606.06827#bib.bib27)\)\. Meta\-learning has also been used to amortize INR fitting across families of signals, for example through sparse meta\-initializations that adapt quickly to unseen targets\(Leeet al\.,[2021](https://arxiv.org/html/2606.06827#bib.bib24)\)\. These architectures are explicitly designed to promote cross\-instance generalization\. Our experiments deliberately use simpler SIREN, ReLU, and Fourier\-feature INRs to measure how much transfer magnitude and transfer specificity emerge even without such machinery\.
## 3Methods
### 3\.1Controlled Test Families and PDE Benchmarks
Controlled 1D geometric test family:The familygt\(x\)=x2\+t2g\_\{t\}\(x\)=\\sqrt\{x^\{2\}\+t^\{2\}\}is an analytic, non\-PDE target used to stress\-test architecture\-dependent transfer near a developing cusp\. It is smooth fort\>0t\>0and approaches\|x\|\|x\|ast→0t\\to 0\. This experiment is a controlled diagnostic test, not a physical PDE calculation\.
Controlled two\-parameter 1D test family:We define a two\-parameter*family of 1D target functions*using exponentially damped cosine modes:
cn\(x,y\)=e−πn2ycos\(2πnx\),n=1,2,3c\_\{n\}\(x,y\)=e^\{\-\\pi n^\{2\}y\}\\cos\(2\\pi nx\),\\quad n=1,2,3\(1\)over the parameter rectangle\(xpar,ypar\)∈\[−0\.5,0\.5\]×\[0\.1,2\.0\]\(x\_\{\\mathrm\{par\}\},y\_\{\\mathrm\{par\}\}\)\\in\[\-0\.5,0\.5\]\\times\[0\.1,2\.0\]\. For each parameter pair we define the 1D signal
fxpar,ypar\(ξ\)=c1\(xpar,ypar\)cos\(2πξ\)\+c2\(xpar,ypar\)cos\(4πξ\)\+c3\(xpar,ypar\)cos\(6πξ\),f\_\{x\_\{\\mathrm\{par\}\},y\_\{\\mathrm\{par\}\}\}\(\\xi\)=c\_\{1\}\(x\_\{\\mathrm\{par\}\},y\_\{\\mathrm\{par\}\}\)\\cos\(2\\pi\\xi\)\+c\_\{2\}\(x\_\{\\mathrm\{par\}\},y\_\{\\mathrm\{par\}\}\)\\cos\(4\\pi\\xi\)\+c\_\{3\}\(x\_\{\\mathrm\{par\}\},y\_\{\\mathrm\{par\}\}\)\\cos\(6\\pi\\xi\),\(2\)evaluated onξ∈\[−1,1\]\\xi\\in\[\-1,1\]\. This is*not*a modular form or heat kernel on an elliptic curve; it is a smooth analytic family chosen for its multi\-scale structure\. No modular invariance is assumed or exploited\.
To avoid terminological ambiguity, we refer to this experiment as a*two\-parameter 1D family*rather than a two\-dimensional signal\. The parameter space\(xpar,ypar\)\(x\_\{\\mathrm\{par\}\},y\_\{\\mathrm\{par\}\}\)is two\-dimensional, but the represented signal remains a one\-dimensional function ofξ\\xi\. Like the 1D geometric family, this is an analytic transfer diagnostic rather than a PDE solve\.
Random controls for analytic tests:In both controlled analytic settings, random controls are built by phase\-randomizing the Fourier representation of the corresponding structured target while preserving the Fourier magnitudes, then matching the random target to the structured target mean and standard deviation\. Source and target random controls useindependentseeds, with source seedssand target seeds\+10000s\+10000\(plus a direction offset in the two\-parameter experiment\)\.
PDE benchmarks:We additionally study three standard one\-dimensional PDE families: heat, viscous Burgers, and focusing cubic nonlinear Schrodinger \(NLS\)\. These experiments are supervised INR regressions against reference PDE solutions, not physics\-informed residual minimization\. The heat equation is
ut=αuxx,x∈\[0,1\),u\_\{t\}=\\alpha u\_\{xx\},\\qquad x\\in\[0,1\),\(3\)with periodic boundary conditions and initial condition
u\(x,0\)=sin\(2πx\)\+12sin\(4πx\)\.u\(x,0\)=\\sin\(2\\pi x\)\+\\frac\{1\}\{2\}\\sin\(4\\pi x\)\.\(4\)The reference solution is evaluated analytically from the Fourier modes,
u\(x,t;α\)=e−α\(2π\)2tsin\(2πx\)\+12e−α\(4π\)2tsin\(4πx\)\.u\(x,t;\\alpha\)=e^\{\-\\alpha\(2\\pi\)^\{2\}t\}\\sin\(2\\pi x\)\+\\frac\{1\}\{2\}e^\{\-\\alpha\(4\\pi\)^\{2\}t\}\\sin\(4\\pi x\)\.\(5\)The source parameter isα=0\.05\\alpha=0\.05, with targetsα∈\{0\.02,0\.10\}\\alpha\\in\\\{0\.02,0\.10\\\}; the non\-designated target parameter is used as the alternate same\-family control source\.
The Burgers benchmark solves
ut\+uux=νuxx,x∈\[0,1\),u\_\{t\}\+uu\_\{x\}=\\nu u\_\{xx\},\\qquad x\\in\[0,1\),\(6\)again with periodic boundary conditions and initial condition
u\(x,0\)=sin\(2πx\)\+0\.25sin\(4πx\)\.u\(x,0\)=\\sin\(2\\pi x\)\+0\.25\\sin\(4\\pi x\)\.\(7\)Reference trajectories are generated with a Fourier pseudo\-spectral spatial discretization and classical fourth\-order Runge–Kutta time stepping\. The source viscosity isν=0\.01\\nu=0\.01, with targetsν∈\{0\.02,0\.005\}\\nu\\in\\\{0\.02,0\.005\\\}and the opposite target again serving as the alternate same\-family control source\.
The NLS benchmark solves the focusing cubic equation
iut\+12uxx\+κ\|u\|2u=0,x∈\[−10,10\),iu\_\{t\}\+\\frac\{1\}\{2\}u\_\{xx\}\+\\kappa\|u\|^\{2\}u=0,\\qquad x\\in\[\-10,10\),\(8\)with periodic boundary conditions and initial conditionu\(x,0\)=sech\(x\)u\(x,0\)=\\mathrm\{sech\}\(x\)\. Reference solutions are generated by a Strang split\-step Fourier method\. The complex\-valued solution is represented by two real output channels,\(Reu,Imu\)\(\\mathrm\{Re\}\\,u,\\mathrm\{Im\}\\,u\)\. The source parameter isκ=1\.0\\kappa=1\.0, with targetsκ∈\{0\.8,1\.2\}\\kappa\\in\\\{0\.8,1\.2\\\}and the opposite target used as the alternate same\-family control source\. The implementation records the discrete NLS mass drift in the reference metadata as a solver sanity check\.
### 3\.2Architectures
We compare three architectures throughout the analytic\-test and PDE benchmark studies:
- •SIREN\(Sitzmannet al\.,[2020](https://arxiv.org/html/2606.06827#bib.bib29)\):first\-layer frequencyω0=30\\omega\_\{0\}=30, hidden\-layer frequencyω0=1\\omega\_\{0\}=1, 3 hidden layers of width 128, with the corresponding SIREN\-style initialization used in the implementation
- •ReLU MLP:3 hidden layers of width 128 with ReLU activations
- •Fourier Features\(Tanciket al\.,[2020](https://arxiv.org/html/2606.06827#bib.bib33)\):64 random Fourier frequencies withσ=10\\sigma=10, followed by the same 3\-layer width\-128 ReLU MLP
These settings define the*default*architecture comparison reported in the main two\-parameter 1D family table and figures\. In a fairness\-oriented re\-analysis of the two\-parameter experiment, we additionally sweep SIREN first\-layer frequencyω0\\omega\_\{0\}and a global SIREN initialization scale, and we sweep ReLU width, depth, and fine\-tuning learning rate while keeping the remaining protocol fixed\. In the current implementation, both analytic\-test and Navier–Stokes Fourier embeddings use the Tancik\-style conventionγ\(v\)=\[sin\(2πBv\),cos\(2πBv\)\]\\gamma\(v\)=\[\\sin\(2\\pi Bv\),\\cos\(2\\pi Bv\)\]; the analytic test studies use 64 sampled frequencies, whereas the Navier–Stokes benchmark uses 128\.
For the 1D PDE reference benchmarks, all architectures map normalized space–time coordinates\(x~,t~\)∈\[−1,1\]2\(\\tilde\{x\},\\tilde\{t\}\)\\in\[\-1,1\]^\{2\}to the PDE state\. Heat and Burgers use one scalar output channel; NLS uses two real output channels\. The reported full\-budget PDE runs use width 128 and depth 3 for all architectures, SIREN first\-layer frequencyω0=30\\omega\_\{0\}=30, hidden\-layer frequencyω0=1\\omega\_\{0\}=1, and Fourier features withmax\(16,width/2\)=64\\max\(16,\\mathrm\{width\}/2\)=64sampled frequencies atσ=10\\sigma=10\. Reference grids useNx=128N\_\{x\}=128spatial points andNt=101N\_\{t\}=101stored time levels\. The quick and smoke profiles use smaller grids and training budgets only for local validation, not for manuscript\-scale results\.
### 3\.3Transfer Learning Protocol
Transfer advantage is defined as
Atransfer=Lscratch/LtransferA\_\{\\text\{transfer\}\}=L\_\{\\text\{scratch\}\}/L\_\{\\text\{transfer\}\}\(9\)afterEft=300E\_\{\\text\{ft\}\}=300epochs of fine\-tuning\.
Analytic\-test training details:The 1D geometric sweep trains on the full 1000\-point gridx∈\[−1,1\]x\\in\[\-1,1\]; the two\-parameter analytic sweep trains on the full 500\-point gridξ∈\[−1,1\]\\xi\\in\[\-1,1\]\. All analytic\-test runs use Adam with no scheduler, no weight decay, and no early stopping\. Pretraining and scratch runs use learning rate10−310^\{\-3\}; fine\-tuning uses learning rate10−410^\{\-4\}\. Pretraining runs for 1500 epochs and both fine\-tuning and scratch training run for 300 epochs\. The reported transfer advantage is therefore a fixed\-budget final\-loss ratio, not a compute\-aware metric and not an equal\-total\-training\-cost comparison\. Because scratch and fine\-tuning do not use learning\-rate\-matched optimization, these ratios should be interpreted as outcomes of a fixed protocol rather than as optimizer\-controlled causal estimates of initialization quality alone\. For Fourier features, fine\-tuning reuses the pretrained random projection matrix because the whole model is copied, whereas scratch training samples a fresh projection; the reported Fourier transfer therefore combines reuse of the trainable MLP weights with reuse of the fixed random feature map\.
PDE benchmark training details:The heat, Burgers, and NLS benchmarks use the same source\-transfer versus scratch structure, but with mini\-batch training over the fixed space–time reference grid\. In the reported full\-budget runs, each source or control pretraining run uses 1500 Adam steps with batch size 4096 and learning rate10−310^\{\-3\}\. Fine\-tuning uses 300 Adam steps with learning rate10−410^\{\-4\}, and scratch training uses the same 300\-step budget with learning rate10−310^\{\-3\}\. For each PDE, the designated source parameter is fine\-tuned to each target parameter\. The alternate\-source control is not an independent random null; it is the same PDE family pretrained at the opposite target parameter and then fine\-tuned to the target of interest\. The reported PDE benchmark transfer advantage is computed on the full space–time reference grid as
APDE=MSEscratchMSEtransfer\.A\_\{\\mathrm\{PDE\}\}=\\frac\{\\mathrm\{MSE\}\_\{\\mathrm\{scratch\}\}\}\{\\mathrm\{MSE\}\_\{\\mathrm\{transfer\}\}\}\.\(10\)To isolate optimizer\-schedule and Fourier\-feature\-map effects, we also run two auxiliary PDE controls\. First, a matched\-optimizer scratch baseline trains scratch models for the same 300\-step target budget using the fine\-tuning learning rate10−410^\{\-4\}rather than10−310^\{\-3\}\. Second, for Fourier Features, we add two feature\-map ablations: one copies the pretrained MLP weights but resamples the fixed random projection matrixBBbefore fine\-tuning, and one reuses only the pretrainedBBwhile randomly initializing the MLP weights\. These controls are reported as diagnostics; the main PDE table keeps the original fixed\-budget protocol for comparability with the analytic experiments\. The implementation caches reference solutions by PDE, parameter, grid size, and time\-grid size, so rerunning the experiment does not recompute the numerical reference data\. The neural training runs themselves are not checkpoint\-resumed in the current implementation; the reported full\-budget outputs are stored as JSON summaries together with a lightweight SVG diagnostic plot\.
The saved revision artifacts also record an auxiliary curve\-aware statistic over the full fine\-tuning trajectory,Acurve=exp\(meantlog\(\(Lscratch\(t\)\+ε\)/\(Ltransfer\(t\)\+ε\)\)\)A\_\{\\mathrm\{curve\}\}=\\exp\\\!\\big\(\\mathrm\{mean\}\_\{t\}\\log\(\(L\_\{\\mathrm\{scratch\}\}\(t\)\+\\varepsilon\)/\(L\_\{\\mathrm\{transfer\}\}\(t\)\+\\varepsilon\)\)\\big\), but the headline tables in this manuscript use the terminal ratioAtransferA\_\{\\mathrm\{transfer\}\}\. We keep the terminal ratio as the primary metric because it is directly comparable across all analytic and PDE suites, maps one\-to\-one to the final target loss after a shared fine\-tuning budget, and matches the practical warm\-start question of which initialization finishes lower under a fixed target\-training budget;AcurveA\_\{\\mathrm\{curve\}\}is retained as a robustness check against outlier\-sensitive terminal values\.
Statistical summaries:The 10\-seed 1D geometric sweep and the 10\-seed two\-parameter analytic sweep report bootstrap percentile 95% confidence intervals for the mean\. In 1D we use one\-sided paired Wilcoxon signed\-rank tests for within\-architecture structured\-versus\-random comparisons \(alternative: structured\>\>random\) and two\-sided paired Wilcoxon tests for pairwise structured\-transfer comparisons across architectures\. In the two\-parameter family we again use one\-sided paired Wilcoxon signed\-rank tests for structured\-versus\-random transfer within each architecture and direction, together with two\-sided pairwise architecture comparisons on structured transfer\. We report unadjustedpp\-values throughout and therefore use them as localized evidence for pre\-specified comparisons rather than as a family\-wise error\-controlled testing program\. As a sensitivity check, a Benjamini–Hochberg correction over the 24 primary Wilcoxon comparisons reported in the main text retains the largest reported passing value atp=0\.020p=0\.020, which is below the corresponding rank\-10 critical value10×0\.05/24≈0\.020810\\times 0\.05/24\\approx 0\.0208; the retained set contains five of the six 1D comparisons \(all except the Fourier\-versus\-SIREN structured comparison\), the two\-parameter ReLU “both” comparison, all three two\-parameter Fourier comparisons, and the NS ReLU100→400100\\\!\\to\\\!400contrast, but not the remaining borderline effects\. For the fairness sweeps, because the per\-seed ratioAgeo/\(Arand\+10−10\)A\_\{\\mathrm\{geo\}\}/\(A\_\{\\mathrm\{rand\}\}\+10^\{\-10\}\)becomes unstable whenArandA\_\{\\mathrm\{rand\}\}is very small, we interpret those sweeps primarily through macro structured/random means, curve\-aware analogues, and the paired Wilcoxon tests rather than through the ratio summary\.
## 4Results
### 4\.1Static Diagnostics Remain Weak
Table[1](https://arxiv.org/html/2606.06827#S4.T1)summarizes a cross\-architecture diagnostic rerun on the controlled 1D geometric family\. The activation\-cloud participation ratio, computed on 20 parameter values times 500 spatial samples for a total of 10,000 activation vectors, shows essentially no structured\-versus\-random separation within any architecture: SIREN gives94\.6894\.68versus94\.7194\.71, ReLU gives14\.7414\.74versus14\.9814\.98, and Fourier Features give15\.4315\.43versus15\.4215\.42\. The corresponding parameter\-manifold participation ratios are similarly close in each case\.
Hessian sharpness, measured as the spectral normλmax\(H\)\\lambda\_\{\\max\}\(H\)following Foret et al\.\(Foretet al\.,[2021](https://arxiv.org/html/2606.06827#bib.bib37)\), is likewise weakly informative\. Across five seeds, SIREN gives143\.3±16\.5143\.3\\pm 16\.5for structured targets and151\.4±17\.9151\.4\\pm 17\.9for random targets, with an exploratory one\-sided two\-samplett\-test atp=0\.375p=0\.375\. ReLU gives18\.7±0\.718\.7\\pm 0\.7versus20\.3±0\.820\.3\\pm 0\.8withp=0\.093p=0\.093, and Fourier gives14\.1±0\.914\.1\\pm 0\.9versus14\.1±0\.914\.1\\pm 0\.9withp=0\.500p=0\.500\. We therefore do not treat sharpness as discriminative evidence\.
Independent\-seed CKA is architecture\-dependent, but it is not specificity\-selective\. On a separate 15\-point diagnostics grid, the Mantel Spearman correlation between pairwise CKA distances and pairwise log\-parameter distances isρ=0\.290\\rho=0\.290for SIREN structured targets andρ=0\.270\\rho=0\.270for SIREN random targets; ReLU givesρ=0\.474\\rho=0\.474andρ=0\.449\\rho=0\.449, and Fourier givesρ=−0\.112\\rho=\-0\.112andρ=0\.268\\rho=0\.268\. In every architecture, the shared\-seed structured baseline is much larger, withρ=0\.754\\rho=0\.754for SIREN,ρ=0\.749\\rho=0\.749for ReLU, andρ=0\.757\\rho=0\.757for Fourier\. The correct conclusion is therefore not that CKA is always uncorrelated with parameter distance, but that under the valid independent\-seed null this Mantel statistic does not cleanly separate structured transfer from random\-control transfer\.
Table 1:Cross\-architecture static diagnostics on the controlled 1D geometric test family\. The participation ratio is computed on the full activation cloud rather than on one mean\-pooled vector per parameter value\. Sharpness entries are mean±\\pmstandard error over five seeds\.
### 4\.2Architecture Dependence in 1D
Table[2](https://arxiv.org/html/2606.06827#S4.T2)and Figure[1](https://arxiv.org/html/2606.06827#S4.F1)summarize the 10\-seed controlled 1D geometric test\. Two patterns are clear\. First,*absolute structured transfer*differs strongly across architectures: Fourier Features have the largest mean advantage at33\.1×33\.1\\times, SIREN is intermediate at23\.0×23\.0\\times, and ReLU is lower at10\.7×10\.7\\times\. Second,*transfer specificity*differs even more sharply: ReLU random\-control transfer is close to zero, Fourier random\-control transfer is much smaller than its structured transfer but not negligible, and SIREN random\-control transfer remains large\.
Table 2:10\-seed architecture comparison for the controlled 1D geometric test,t=0\.5→0\.55t=0\.5\\to 0\.55\. Entries are mean transfer advantages with bootstrap 95% confidence intervals\. Wilcoxonpp\-values are from one\-sided paired signed\-rank tests of whether structured\-target transfer exceeds random\-control transfer within each architecture\.Figure 1:Architecture\-dependent transfer magnitude and specificity in the controlled 1D geometric test over 10 seeds\. Left: structured\-target transfer advantage\. Right: independent\-seed random\-control transfer advantage\. Error bars denote bootstrap 95% confidence intervals for the mean\. Fourier Features have the largest mean structured transfer, while SIREN shows much less specificity because it also transfers strongly on random targets\.Paired structured\-transfer tests still separate both Fourier and SIREN from ReLU, with SIREN exceeding ReLU \(p=0\.0098p=0\.0098\) and Fourier exceeding ReLU \(p=0\.0020p=0\.0020\)\. The Fourier\-versus\-SIREN gap is much weaker after the rerun \(p=0\.0488p=0\.0488\)\. The central issue is not merely that one architecture has a larger transfer number than another\. Rather, architecture determines both how large transfer is on the structured target family and how much of that reuse survives a properly independent random null model\. ReLU and Fourier show strong structured\-vs\-random separation; SIREN does not\.
### 4\.3Two\-Parameter 1D Family: Architecture Dependence Changes Character
Table[3](https://arxiv.org/html/2606.06827#S4.T3)summarizes the paired 10\-seed sweep on the two\-parameter 1D family across all three architectures at the default settings\. Three different regimes emerge\.
First,SIRENagain shows substantial structured transfer, but weak specificity: its structured transfer lies between18\.0×18\.0\\timesand21\.3×21\.3\\times, while random\-control transfer remains between13\.9×13\.9\\timesand17\.4×17\.4\\times, and none of the paired Wilcoxon tests reaches the0\.050\.05threshold\. Second,ReLUshows the clearest specificity in the two\-parameter setting\. Its structured transfer is large forxx\-only andyy\-only adaptation, and even when both parameters vary it remains well above its random\-control baseline; under the pre\-specified one\-sided within\-architecture tests, all three comparisons satisfyp≤0\.0322p\\leq 0\.0322\. Third,Fourier Featuresare the opposite of their 1D behavior: they still separate structured targets from random controls in a statistical sense, but their absolute structured transfer remains modest, ranging from0\.18×0\.18\\timesto1\.76×1\.76\\timesacross the three directions\.
These results show that the two\-parameter story is not a simple extension of the 1D ranking\. In this controlled analytic family, explicit Fourier features excel in 1D but remain weak in absolute transfer at the default scale, ReLU becomes the clearest discriminator at the default settings, and default SIREN remains reusable but only weakly specific\.
Fairness\-oriented within\-family sweeps modify the family\-level interpretation\. For SIREN, the default setting is weakly specific, but reducing the initialization scale from11to0\.50\.5changes the macro means from22\.96×22\.96\\timesstructured versus16\.32×16\.32\\timesrandom to20\.94×20\.94\\timesversus2\.91×2\.91\\times, and all three directional Wilcoxon tests become significant \(p=0\.0010p=0\.0010,0\.00100\.0010,0\.00980\.0098\)\. Increasingω0\\omega\_\{0\}to100100also improves specificity, giving macro means20\.23×20\.23\\timesversus10\.04×10\.04\\timeswith all three directions significant\. The sweep baseline at init\-scale11is not numerically identical to the default Table[3](https://arxiv.org/html/2606.06827#S4.T3)SIREN baseline, because the fairness sweep uses a different architecture\-index seed offset; we therefore interpret the sweep as a within\-family sensitivity study rather than as a literal re\-estimation of the default table\. For ReLU, the family is more robust across the tested sweeps: width128128remains strong, and depth44is the strongest clean setting among the tested ReLU variants, with macro means35\.87×35\.87\\timesversus2\.12×2\.12\\timesand all three directional tests significant \(p≤0\.0244p\\leq 0\.0244\)\. By contrast, some apparently dramatic terminal\-ratio wins, such as ReLU depth55, are not stable across summaries: the macro terminal advantage rises to366\.94×366\.94\\times, but the corresponding macro random transfer is still24\.28×24\.28\\timesand the curve\-aware macro advantage is only19\.83×19\.83\\times, indicating heavy outlier sensitivity rather than a clean family\-level improvement\. The fairness sweeps therefore support a narrower claim than the default table alone: ReLU remains the most robust two\-parameter family across the tested settings, but SIREN is more tunable and more specific under some settings than the default comparison suggests\.
Table 3:Paired 10\-seed sweep on the controlled two\-parameter 1D family at the default architecture settings\. Entries are mean transfer advantages with bootstrap 95% confidence intervals\. Wilcoxonpp\-values are from one\-sided paired signed\-rank tests of whether structured\-target transfer exceeds random\-control transfer within each architecture and direction\. The ratio column summarizes the per\-seed quantityAgeo/\(Arand\+10−10\)A\_\{\\mathrm\{geo\}\}/\(A\_\{\\mathrm\{rand\}\}\+10^\{\-10\}\); it is*not*the ratio of the reported structured and random means\. BecauseArandA\_\{\\mathrm\{rand\}\}can be very small, this ratio is unstable and is included only as a secondary descriptive statistic\.Figure[2](https://arxiv.org/html/2606.06827#S4.F2)visualizes the same paired comparison on the controlled two\-parameter 1D family\. The clearest qualitative contrast is between ReLU, which separates structured targets from random controls in all three directions under the pre\-specified one\-sided tests, and SIREN, whose structured and random bars remain close\.
Figure 2:Paired structured\-versus\-random transfer comparison on the controlled two\-parameter 1D family across architectures over 10 seeds at the default architecture settings\. Each panel shows mean transfer advantage, and error bars denote bootstrap 95% confidence intervals for the mean\. ReLU shows the clearest specificity, default SIREN remains weakly specific, and Fourier separates structured targets from random controls mainly because both transfers are small in absolute terms\.The surfaceΦ\(x,y\)=\(c1,c2,c3\)\\Phi\(x,y\)=\(c\_\{1\},c\_\{2\},c\_\{3\}\)visualization \(Figure[3](https://arxiv.org/html/2606.06827#S4.F3)\) shows the*analytic*coefficient map, not learned neural representations\.
Figure 3:Analytic coefficient surfaceΦ:\(x,y\)↦\(c1,c2,c3\)\\Phi:\(x,y\)\\mapsto\(c\_\{1\},c\_\{2\},c\_\{3\}\)for the two\-parameter 1D family\. This surface is computed directly from the closed\-form coefficients, not from neural network representations\.Figure 4:Structured\-target transfer comparison in the controlled 1D geometric test and the default SIREN sweep on the controlled two\-parameter 1D family over 10 seeds\. Bars show mean transfer advantage and error bars show bootstrap 95% confidence intervals\. Table[3](https://arxiv.org/html/2606.06827#S4.T3)extends the comparison to ReLU and Fourier, revealing that the two\-parameter architecture ranking differs qualitatively from the 1D ranking\.
### 4\.4Scaling Law: Negative Result Across the Implemented 1D Architectures
We tested the heuristic predictionA∝1/Δt2A\\propto 1/\\Delta t^\{2\}in the implemented 1D setting for all three architectures, usingtsource=0\.5t\_\{\\mathrm\{source\}\}=0\.5, eight target offsetsΔt∈\[0\.01,0\.2\]\\Delta t\\in\[0\.01,0\.2\], and three seeds averaged at each offset\. The regression therefore remains a limited probe: it averages over only three seeds per architecture and uses one optimization protocol throughout\. It nevertheless supports a clear qualitative conclusion\. The fitted relations are:
- •SIREN:slope0\.026±0\.0100\.026\\pm 0\.010,R2=0\.53R^\{2\}=0\.53,H0:slope=1H\_\{0\}\\\!:\\text\{slope\}=1rejected atp=8\.5×10−11p=8\.5\\times 10^\{\-11\}
- •ReLU MLP:slope0\.308±0\.0320\.308\\pm 0\.032,R2=0\.94R^\{2\}=0\.94,H0:slope=1H\_\{0\}\\\!:\\text\{slope\}=1rejected atp=6\.1×10−7p=6\.1\\times 10^\{\-7\}
- •Fourier Features:slope0\.590±0\.0700\.590\\pm 0\.070,R2=0\.92R^\{2\}=0\.92,H0:slope=1H\_\{0\}\\\!:\\text\{slope\}=1rejected atp=1\.1×10−3p=1\.1\\times 10^\{\-3\}
The usuallinregresspp\-value tests the irrelevant nullH0:slope=0H\_\{0\}\\\!:\\text\{slope\}=0; the relevant comparison for this ansatz isH0:slope=1H\_\{0\}\\\!:\\text\{slope\}=1\. Under that test, the ansatz is rejected for all three architectures\. The rejection does*not*imply that transfer is independent ofΔt\\Delta t: ReLU and Fourier both decay withΔt\\Delta t, but substantially more slowly than the predicted quadratic law\. We therefore treat this as a negative empirical result against a simple heuristic model, not as a theorem about transfer scaling in INRs\.
Figure 5:Cross\-architecture scaling\-law audit of the predictionAtransfer∝1/Δt2A\_\{\\text\{transfer\}\}\\propto 1/\\Delta t^\{2\}\. Each panel plotslog10A\\log\_\{10\}Aversuslog10\(1/Δt2\)\\log\_\{10\}\(1/\\Delta t^\{2\}\)for one architecture\. The expected slope1\.01\.0is rejected for SIREN, ReLU, and Fourier Features\. ReLU and Fourier show clear decay withΔt\\Delta t, but much slower than the quadratic\-law prediction\.
## 5PDE Benchmarks
### 5\.1Navier–Stokes Lid\-Driven Cavity
To assess whether the architecture\-dependent transfer pattern established on controlled analytic families extends to a physically realistic setting, we apply the same INR transfer protocol to the steady lid\-driven cavity problem\. The target function family is parameterized by Reynolds numberRe∈\{100,400,1000\}Re\\in\\\{100,400,1000\\\}, which plays the role of the continuous parameterttin the analytic tests\. For eachReRe, we solve the steady incompressible Navier\-Stokes equations
\(𝐮⋅∇\)𝐮=−∇p\+1Re∇2𝐮,∇⋅𝐮=0,\(\\mathbf\{u\}\\cdot\\nabla\)\\mathbf\{u\}=\-\\nabla p\+\\tfrac\{1\}\{Re\}\\nabla^\{2\}\\mathbf\{u\},\\quad\\nabla\\cdot\\mathbf\{u\}=0,onΩ=\[0,1\]2\\Omega=\[0,1\]^\{2\}using a stream\-function/vorticity finite\-difference solver \(129×129129\\times 129uniform grid\), with the lidux=1u\_\{x\}=1aty=1y=1and no\-slip on the remaining walls\. The converged velocity field𝐮\(x,y;Re\)=\(ux,uy\)\\mathbf\{u\}\(x,y;Re\)=\(u\_\{x\},u\_\{y\}\)serves as the INR regression target\. This is a steady, moderate\-Reynolds\-number laminar cavity benchmark, not a turbulent or time\-dependent flow setting\.
Each architecture maps\(x,y\)∈\[0,1\]2\(x,y\)\\in\[0,1\]^\{2\}to\(ux,uy\)\(u\_\{x\},u\_\{y\}\)\. All NS models use width 256 and depth 4; SIREN uses first\-layerω0=30\\omega\_\{0\}=30, and the Fourier model uses 128 Fourier frequencies withσ=10\\sigma=10\. Training uses Adam at learning rate10−410^\{\-4\}throughout, with mini\-batches of 2048 grid points sampled uniformly with replacement from the fixed129×129129\\times 129cavity grid\. The transfer protocol matches the analytic tests only in broad structure: pretrain at sourceRe0Re\_\{0\}for 50,000 gradient steps, then fine\-tune at targetRe1Re\_\{1\}for 10,000 steps; scratch training also uses 10,000 steps\. We test three directions\(Re0,Re1\)∈\{\(100,400\),\(400,1000\),\(100,1000\)\}\(Re\_\{0\},Re\_\{1\}\)\\in\\\{\(100,400\),\\,\(400,1000\),\\,\(100,1000\)\\\}; the remaining Reynolds numberRe⟂Re\_\{\\perp\}serves as an alternate same\-family source condition \(pretrained atRe⟂Re\_\{\\perp\}, fine\-tuned atRe1Re\_\{1\}with the same seed\), not an independent random baseline\. As a stronger null on the same target, we additionally copy the designated\-source checkpoint, randomly shuffle all trainable weights while leaving fixed buffers such as FourierBBunchanged, and then fine\-tune this shuffled source toRe1Re\_\{1\}with the same target budget\. All experiments usen=10n=10seeds with bootstrap 95% confidence intervals and paired*two\-sided*Wilcoxon signed\-rank tests for designated\-versus\-alternate comparisons, plus one\-sided paired Wilcoxon tests for designated\-source transfer exceeding shuffled\-source transfer\. The NS experiment records per\-seed transfer advantage in decibels,AdB=10log10\(MSEscratch/MSEtransfer\)A\_\{\\mathrm\{dB\}\}=10\\log\_\{10\}\(\\mathrm\{MSE\}\_\{\\mathrm\{scratch\}\}/\\mathrm\{MSE\}\_\{\\mathrm\{transfer\}\}\)\. For comparability with the analytic tests, Table[4](https://arxiv.org/html/2606.06827#S5.T4)reports the corresponding linear\-ratio equivalents10A¯dB/1010^\{\\bar\{A\}\_\{\\mathrm\{dB\}\}/10\}obtained from the mean dB values, while the Wilcoxon tests are applied to the underlying per\-seed dB values\. Because the alternate\-source condition is neither an independent random null nor a distance\-matched control, the NS section should be interpreted as a source\-choice comparison rather than as a structured\-versus\-random specificity test\.
### 5\.2Navier–Stokes Results
Table[4](https://arxiv.org/html/2606.06827#S5.T4)reports converted linear\-ratio equivalents for the designated and alternate same\-family sources, while Figure[6](https://arxiv.org/html/2606.06827#S5.F6)shows the underlying dB values\. Four patterns emerge\.
SIREN transfers substantially but shows little separation from the alternate\-source condition\.Designated\-source transfer ranges from3\.2×3\.2\\timesto4\.0×4\.0\\timesacross all three directions, confirming that SIREN weights are reused across the Reynolds\-number family\. However, alternate\-source transfer is similarly large \(3\.73\.7–3\.9×3\.9\\times\), and none of the paired Wilcoxon tests reaches the0\.050\.05threshold \(allp≥0\.13p\\geq 0\.13\)\. This is consistent with the SIREN pattern from the analytic tests: weight reuse is real but not sharply selective\.
ReLU MLP is the most source\-conditioned\.For100→400100\\to 400, designated\-source transfer \(9\.8×9\.8\\times, 95% CI\[3\.6,19\.6\]\[3\.6,19\.6\]\) exceeds alternate\-source transfer \(3\.3×3\.3\\times, CI\[1\.7,6\.7\]\[1\.7,6\.7\]\), withp=0\.020p=0\.020\. For400→1000400\\to 1000the separation is similar \(11\.4×11\.4\\timesvs5\.5×5\.5\\times\) and borderline significant \(p=0\.064p=0\.064\)\. For100→1000100\\to 1000, both designated\-source \(14\.7×14\.7\\times\) and alternate\-source \(11\.2×11\.2\\times\) transfer are large and comparable \(p=0\.695p=0\.695\), suggesting that across the widest Reynolds gap the pretrained representation is useful regardless of which same\-family source was used\. Overall, ReLU provides the clearest separation between the designated and alternate source conditions on this benchmark\.
Fourier Features transfer weakly\.Designated\-source transfer is modest \(1\.71\.7–2\.8×2\.8\\times\) and never reliably exceeds alternate\-source transfer \(allp≥0\.32p\\geq 0\.32\)\. The CI for400→1000400\\to 1000includes1×1\\times, indicating that fine\-tuning from a Fourier pretrain provides limited or unreliable benefit over training from scratch\.
The shuffled\-weight source null confirms that the learned source weights usually matter\.Table[5](https://arxiv.org/html/2606.06827#S5.T5)compares designated\-source transfer against a shuffled copy of the same source checkpoint\. SIREN and ReLU designated\-source transfer exceeds the shuffled source by10\.310\.3–13\.113\.1dB across all Reynolds\-number directions \(all one\-sided pairedp≤0\.002p\\leq 0\.002\)\. Fourier is mixed:100→400100\\to 400still exceeds the shuffled source by4\.584\.58dB \(p=0\.0068p=0\.0068\), but the400→1000400\\to 1000and100→1000100\\to 1000differences are smaller and not significant at0\.050\.05\. Thus the alternate\-source control understated one useful fact for SIREN: even when transfer is not selective among Reynolds\-number sources, the learned source weights are still far better than a shuffled\-weight null\.
Table 4:10\-seed Navier\-Stokes benchmark \(lid\-driven cavity,Re∈\{100,400,1000\}Re\\in\\\{100,400,1000\\\}\)\. The underlying experiment records per\-seed transfer advantage in dB,AdB=10log10\(MSEscratch/MSEtransfer\)A\_\{\\mathrm\{dB\}\}=10\\log\_\{10\}\(\\mathrm\{MSE\}\_\{\\mathrm\{scratch\}\}/\\mathrm\{MSE\}\_\{\\mathrm\{transfer\}\}\)\. For comparability with the analytic tests, each table entry reports the linear\-ratio equivalent of the mean dB value,10A¯dB/1010^\{\\bar\{A\}\_\{\\mathrm\{dB\}\}/10\}, rather than the mean of per\-seed linear ratios\. The alternate\-source condition pretrains onRe⟂Re\_\{\\perp\}, the third Reynolds number in\{100,400,1000\}\\\{100,400,1000\\\}; it is not an independent random baseline\. The Wilcoxonpp\-value is the paired two\-sided test on the per\-seed dB values comparing designated\-source and alternate\-source transfer\.p∗<0\.05\{\}^\{\*\}p<0\.05\.Table 5:NS shuffled\-weight source control in the underlying dB metric\. The shuffled source is made by copying the designated\-source checkpoint, randomly permuting trainable weights, leaving fixed buffers such as FourierBBunchanged, and fine\-tuning to the same target with the same budget\. Thepp\-value is the one\-sided paired Wilcoxon test for designated\-source transfer exceeding shuffled\-source transfer\.Figure 6:NS benchmark: designated\-source vs\. alternate\-source transfer advantage across architectures and Reynolds\-number directions over 10 seeds, shown in the underlying dB metric used by the experiment\. Error bars are bootstrap 95% confidence intervals for the mean\. ReLU shows the clearest separation between the two source conditions; SIREN transfers substantially but similarly from both sources; Fourier Features transfer weakly in both conditions\.
### 5\.3Relation to the Analytic Tests
The NS benchmark is broadly consistent with the two\-parameter 1D analytic family rather than a literal replication of it, but its control is different\. It compares transfer from a designated source Reynolds number against transfer from the third same\-family source rather than against an independent random null\. Interpreted that way, ReLU again shows the clearest separation between source conditions, SIREN remains broadly reusable, and Fourier remains weak in absolute transfer\. The shuffled\-weight null adds a second interpretation: SIREN is broad across Reynolds\-number sources, but not merely generic initialization reuse, because its designated\-source transfer exceeds shuffled\-source transfer by more than1010dB in all three directions\. ReLU is both source\-conditioned and strongly above the shuffled null\. Fourier remains the weakest and most variable case\. The weak NS Fourier result atσ=10\\sigma=10is at least qualitatively compatible with the analytic\-test sigma sweep, which shows that the default analytic settingσ=10\\sigma=10is poor while smaller values can perform much better\. That said, the sigma ablation was run only on the analytic family, so it does not by itself determine the optimal NS bandwidth\.
### 5\.41D PDE Reference\-Solution Suite
Table[6](https://arxiv.org/html/2606.06827#S5.T6)summarizes the full\-budget heat, Burgers, and NLS experiments\. Each entry aggregates over two target directions and ten seeds \(n=20n=20terminal transfer ratios\)\. The control condition is the alternate same\-family source, not an independent random target, so these values should be read as source\-conditioned transfer comparisons rather than structured\-versus\-random specificity tests\.
Table 6:Full\-budget 1D PDE reference\-solution benchmarks over heat, viscous Burgers, and focusing cubic NLS\. Values are macro means over two target directions and ten seeds, with bootstrap 95% confidence intervals\. Transfer advantage is the full\-grid terminal ratioMSEscratch/MSEtransfer\\mathrm\{MSE\}\_\{\\mathrm\{scratch\}\}/\\mathrm\{MSE\}\_\{\\mathrm\{transfer\}\}\. The alternate\-source condition pretrains on the opposite target parameter within the same PDE family; it is not an independent random null\. Thepp\-value is the one\-sided paired Wilcoxon signed\-rank test comparing designated\-source transfer to alternate\-source transfer over the 20 paired direction\-seed values\.The PDE benchmarks add two useful checks to the analytic\-test and NS story\. First, the architecture ranking is not universal across equations: SIREN has the largest heat transfer magnitude, Fourier Features dominate Burgers, and SIREN and Fourier are comparable in NLS magnitude\. Second, magnitude and specificity remain separable\. In NLS, SIREN reaches20\.05×20\.05\\timesdesignated\-source transfer, but alternate\-source transfer is also17\.79×17\.79\\times, giving only a1\.13×1\.13\\timesspecificity ratio\. ReLU has lower NLS magnitude \(4\.73×4\.73\\times\) but a cleaner3\.51×3\.51\\timesratio, while Fourier combines high NLS magnitude \(19\.50×19\.50\\times\) with stronger separation from the alternate source \(3\.35×3\.35\\times\)\. Thus the complex\-valued dispersive PDE does not collapse the architecture dependence; it makes the distinction between weight reuse and source specificity more visible\.
The auxiliary controls clarify what the terminal ratios do and do not mean\. When scratch training uses the same low learning rate as fine\-tuning, absolute transfer ratios become much larger—for example, matched\-optimizer designated\-source ratios reach250\.9×250\.9\\timesfor heat/SIREN,353\.3×353\.3\\timesfor Burgers/ReLU, and1974\.0×1974\.0\\timesfor NLS/Fourier\. This confirms that terminal transfer magnitude is strongly schedule\-sensitive\. However, the designated\-versus\-alternate specificity ratios are unchanged by this common scratch numerator, so the source\-specific conclusions in Table[6](https://arxiv.org/html/2606.06827#S5.T6)do not rely on the faster scratch learning rate\. We therefore keep the fixed\-budget terminal metric in the main tables because it answers the operational question “which initialization gives lower loss after the same target budget?”, while the matched\-optimizer results are interpreted as causal sensitivity checks rather than as replacements for the headline warm\-start protocol\. The Fourier feature\-map ablation is more diagnostic: copying the pretrained MLP weights but resamplingBBgives only0\.0100\.010–0\.011×0\.011\\timestransfer under the default scratch baseline, and reusing onlyBBwith random MLP weights gives only0\.0110\.011–0\.023×0\.023\\times\. Under the matched\-optimizer scratch baseline these variants rise to roughly scratch\-level performance \(0\.620\.62–1\.36×1\.36\\times\), but remain far below the full Fourier transfer\. Thus Fourier transfer in these PDE runs is not explained by the fixed random projection alone; it depends on the coupled reuse of the projection and the trained weights\.
## 6Discussion
### 6\.1Architecture Controls Magnitude and Specificity
The controlled 1D geometric sweep shows that two different quantities must be distinguished:*how much*transfer occurs, and*what that transfer means*\. Fourier Features have the largest mean structured transfer in the current 10\-seed 1D sweep, which is consistent with the hypothesis that explicit frequency encoding is well matched to the spectral structure ofgt\(x\)=x2\+t2g\_\{t\}\(x\)=\\sqrt\{x^\{2\}\+t^\{2\}\}\. But specificity is a separate issue\. SIREN still shows substantial transfer on independent\-seed random targets, ReLU remains close to zero on the random control, and Fourier retains nonzero random transfer that is nonetheless much smaller than its structured transfer\.
The two\-parameter sweep sharpens this point by showing that there is no single universal architecture ranking\. At the default settings, ReLU, not Fourier, shows the clearest structured\-vs\-random separation\. Fourier still separates structured targets from random controls, but mainly because both transfers are small in absolute terms\. Default SIREN remains reusable, but that reuse is only weakly specific\. The fairness sweeps show that this is not a rigid family\-level property: smaller SIREN initialization scale and, more moderately, largerω0\\omega\_\{0\}can restore clear specificity, whereas ReLU remains the most robust family across the tested width, depth, and fine\-tuning\-learning\-rate sweeps\. Architecture therefore affects both weight reuse and the degree to which that reuse reflects genuine target\-family continuity, and these effects depend on both the dimensionality of the parameter family and the operating point inside each architecture family\.
The Navier\-Stokes benchmark shows that this two\-parameter architecture picture is not obviously an artifact of the controlled cosine family\. On a physically realistic steady\-flow system with a one\-dimensional parameter \(ReRe\), the qualitative pattern is similar under the alternate\-source control: ReLU is the clearest discriminator between source conditions, SIREN is broadly reusable from either source, and Fourier Features provide weak transfer in both conditions\. The shuffled\-weight null strengthens this interpretation by showing that SIREN and ReLU transfer are not reproduced by destroying the learned source weights; the Fourier result remains more ambiguous in two of the three Reynolds\-number directions\. The NS evidence is still limited to one steady laminar geometry, but it is no longer based only on the alternate\-source comparison\.
### 6\.2Proper Null Models Matter
Independent\-seed random controls are essential for assessing transfer specificity\. In the 10\-seed controlled 1D geometric test, the effect of this null model is not universal across architectures\. For ReLU, independent\-seed random controls drive transfer close to zero\. For Fourier, random\-control transfer remains far below structured\-target transfer but is no longer negligible in mean\. For SIREN, substantial random transfer survives\. The correct conclusion is therefore not that “random transfer vanishes,” but that null\-model separation itself is architecture\-dependent\.
### 6\.3What the Two\-Parameter Result Actually Says
The two\-parameter experiment supports a more structured statement than either a universal success or a universal failure story\. At the default settings, ReLU shows clear specificity under the independent\-seed null model, default SIREN shows substantial but weakly specific transfer, and Fourier shows poor absolute transfer despite statistically significant structured\-vs\-random separation\. The fairness sweeps narrow the claim further: the main table should be interpreted as a default\-configuration comparison, not as a statement that every SIREN setting is weakly specific or that every large terminal ratio inside the ReLU family represents a robust improvement\. Smaller SIREN initialization scale can restore strong specificity, while some apparently extreme ReLU wins are largely terminal\-ratio outliers and look much more modest under the curve\-aware summary\. The two\-parameter result is therefore not “transfer collapses in higher\-dimensional parameter space,” nor “all architectures exploit parameter continuity equally well\.” It is that the nature of transfer becomes strongly architecture\-dependent and qualitatively different from the 1D case\.
A coarse 10\-seed Fourier\-feature\-scale ablation with independent random controls shows that the weak two\-parameter Fourier result is strongly hyperparameter\-dependent\. With the default scaleσ=10\\sigma=10, the macro\-mean structured transfer is only0\.59×0\.59\\timesand the macro\-mean random transfer is0\.08×0\.08\\times\. Lowering the scale toσ=1\\sigma=1raises the macro\-mean structured transfer to50\.0×50\.0\\timeswhile keeping macro\-mean random transfer at6\.3×6\.3\\times;σ=2\\sigma=2gives39\.2×39\.2\\timesand3\.1×3\.1\\times\. By contrast,σ=0\.5\\sigma=0\.5produces even larger structured transfer \(63\.0×63\.0\\times\) but also very large random\-control transfer \(49\.0×49\.0\\times\), so specificity deteriorates sharply\. The paired macro structured\-target comparisons versusσ=10\\sigma=10are significant forσ∈\{0\.5,1,2\}\\sigma\\in\\\{0\.5,1,2\\\}\(allp=0\.00195p=0\.00195\)\. The revised ablation therefore suggests a genuine magnitude\-specificity tradeoff inside the analytic Fourier family: the defaultσ=10\\sigma=10is a poor operating point for this controlled analytic problem, but smallerσ\\sigmais not uniformly better\. This analytic\-only ablation does not by itself determine the optimal NS bandwidth, because the NS benchmark differs in input dimensionality, feature count, width, and control condition\.
### 6\.4Efficacy, Cost, and Practical Guidance
The transfer ratios reported here measure fixed\-budget fine\-tuning benefit, not total wall\-clock superiority over training from scratch\. For the analytic and 1D PDE studies, a source model is pretrained for 1500 steps and then fine\-tuned for 300 steps, while scratch training for a target also uses 300 steps\. Thus, if a source model is used for only one target, the pretraining cost is not amortized\. The relevant use case is a parametric sweep in which a trained source representation can seed multiple nearby targets, or a production workflow in which pretrained INR surrogates are updated repeatedly as parameters change\. In the completed heat/Burgers/NLS run, the full 10\-seed, three\-architecture PDE benchmark suite took about5\.0×1035\.0\\times 10^\{3\}seconds end\-to\-end, with cached reference solutions reused across reruns; the additional matched\-optimizer, Fourier\-ablation, and saved\-curve PDE controls increase output size and target\-phase training work but reuse the same cached reference solutions\. The NS benchmark is more expensive by design, using 50,000 pretraining steps and 10,000 fine\-tuning or scratch steps per Reynolds\-number direction\.
The practical architecture guidance is correspondingly conditional\. If the workflow values*selective*reuse across related parameters, ReLU MLPs are the most robust default in our experiments: they give the clearest random\-control separation in the controlled families and the clearest designated\-source advantage in NS\. If the workflow values broad warm\-start reuse and can tolerate weaker source specificity, SIREN can be useful, but its strong transfer on random or alternate sources means that transfer magnitude alone should not be interpreted as evidence of physical parameter continuity\. Fourier features require explicit bandwidth tuning: they are strong on the 1D geometric test and Burgers, weak at the default scale on the two\-parameter analytic family and NS, and substantially improved by smaller frequency scales in the analytic sigma sweep\. For computational\-physics use, the main recommendation is therefore to report source\-specific controls, seed variability, and architecture hyperparameters together with any INR transfer gains\.
### 6\.5Revised Theoretical Picture
Our results suggest the following revised picture:
1. 1\.Transfer performance depends strongly on architecture, and transfer specificity depends on architecture even more strongly\.
2. 2\.The two\-parameter architecture ranking differs qualitatively from the 1D ranking at the default settings: ReLU is the clearest discriminator, default SIREN is weakly specific, and Fourier has poor absolute transfer\. Fairness sweeps weaken the strongest family\-level version of this claim by showing that tuned SIREN can also become clearly specific\.
3. 3\.A broadly similar architecture pattern appears on a physically realistic PDE benchmark\. On the Navier\-Stokes lid\-driven cavity family \(Re∈\{100,400,1000\}Re\\in\\\{100,400,1000\\\}\) under an alternate same\-family source control, ReLU again shows the clearest source\-conditioned advantage, SIREN again transfers broadly, and Fourier again transfers weakly\. A shuffled\-weight null shows that SIREN/ReLU transfer is nevertheless strongly above destroyed\-weight reuse\.
4. 4\.The heat, Burgers, and NLS reference\-solution benchmarks broaden the evidence but do not produce a single architecture winner: transfer magnitude and alternate\-source specificity vary by equation, with the complex\-valued NLS case showing especially clear separation between broad SIREN reuse and more selective ReLU/Fourier transfer\.
5. 5\.Matched\-optimizer and Fourier feature\-map controls show that terminal transfer magnitudes are optimizer\-schedule sensitive, while Fourier transfer depends on coupled reuse of the fixed projection matrix and trained MLP weights rather than on either component alone\.
6. 6\.Static diagnostics remain weakly informative: PR and Hessian sharpness do not separate the targets, and independent\-seed CKA is not specificity\-selective\.
7. 7\.Simple theoretical models do not capture the observed transfer dynamics: the slope\-11scaling ansatz is rejected for all three implemented 1D architectures\.
## 7Conclusion
This paper studies transfer in implicit neural representations as a property of the representation itself, not only as a convenience for faster fine\-tuning\. Across controlled analytic families, Navier–Stokes, and 1D PDE reference\-solution benchmarks, the main result is consistent: transfer magnitude and transfer specificity are separate quantities, and both depend strongly on architecture\. Fourier Features are strongest on the controlled 1D geometric test, ReLU is the clearest discriminator on the controlled two\-parameter family and Navier–Stokes benchmark, and the heat/Burgers/NLS suite shows that no single architecture dominates every equation in absolute transfer\.
The broader lesson is methodological\. Transfer gains should be interpreted together with explicit control conditions, because large warm\-start improvements can coexist with weak source specificity\. In our experiments, SIREN often reuses weights broadly, ReLU is usually more selective, and Fourier Features can swing from strong to weak depending on bandwidth\. Static diagnostics such as participation ratio, Hessian sharpness, and independent\-seed CKA do not provide a reliable shortcut, and the simple scaling lawAtransfer∝1/Δt2A\_\{\\text\{transfer\}\}\\propto 1/\\Delta t^\{2\}does not describe the observed dynamics\. For readers interested in coordinate networks and scientific machine learning, the practical message is that architecture choice should be evaluated as a transfer\-learning design decision, not only as a single\-task approximation choice\.
#### Limitations\.
The study is still bounded by a fixed optimization protocol, 10\-seed experiments in the main suites, and relatively small hyperparameter sweeps\. The analytic tests use independent random controls, but the PDE benchmarks rely on alternate\-source or shuffled\-weight controls rather than fully off\-family nulls\. We also do not yet mirror the Fourier feature\-map reuse ablation from the PDE suite in the analytic headline experiments, so the analytic Fourier results still combine MLP\-weight reuse with reuse of the fixed random feature map\. The Navier–Stokes study covers one steady laminar geometry and also uses larger models and a different schedule than the analytic suites, so it should be read as a qualitative replication rather than a fully harmonized cross\-setting comparison\. The 1D PDE suite uses supervised regression on fixed periodic grids with only two target parameters per equation\. Absolute transfer magnitudes are also sensitive to learning\-rate choices and, for Fourier Features, to bandwidth and feature\-map reuse\. These limitations do not overturn the central architecture\-dependent patterns, but they do bound the claims to the tested architectures, controls, and transfer regimes\.
## Declaration of competing interest
The author declares no competing financial or non\-financial interests in relation to the work described\.
## Declaration of generative AI and AI\-assisted technologies in the manuscript preparation process
During the preparation of this work, the author used ChatGPT and Claude in order to assist with code drafting and debugging for experimental workflows, and with language polishing during manuscript preparation\. After using these services, the author reviewed and edited the content as needed and takes full responsibility for the content of the published article\.
## References
- P\. Benner, S\. Gugercin, and K\. Willcox \(2015\)A survey of projection\-based model reduction methods for parametric dynamical systems\.SIAM Review57\(4\),pp\. 483–531\.External Links:[Document](https://dx.doi.org/10.1137/130932715)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p1.1)\.
- G\. Berkooz, P\. Holmes, and J\. L\. Lumley \(1993\)The proper orthogonal decomposition in the analysis of turbulent flows\.Annual Review of Fluid Mechanics25\(1\),pp\. 539–575\.External Links:[Document](https://dx.doi.org/10.1146/annurev.fl.25.010193.002543)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p1.1)\.
- S\. K\. Boya and D\. Subramani \(2024\)A physics\-informed transformer neural operator for learning generalized solutions of initial boundary value problems\.External Links:2412\.09009,[Link](https://arxiv.org/abs/2412.09009)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p4.1)\.
- S\. L\. Brunton and J\. N\. Kutz \(2024\)Promising directions of machine learning for partial differential equations\.Nature Computational Science4\(7\),pp\. 483–494\.External Links:[Document](https://dx.doi.org/10.1038/s43588-024-00643-2),[Link](https://doi.org/10.1038/s43588-024-00643-2)Cited by:[§1](https://arxiv.org/html/2606.06827#S1.p1.1)\.
- H\. Chandravamsi, D\. V\. Shenoy, and S\. H\. Frankel \(2025\)Improving accuracy and efficiency of implicit neural representations: making siren a winner\.External Links:2509\.12980,[Link](https://arxiv.org/abs/2509.12980)Cited by:[§2\.2](https://arxiv.org/html/2606.06827#S2.SS2.p1.1)\.
- M\. Chatzopoulos and P\. Koutsourelakis \(2024\)Physics\-aware neural implicit solvers for multiscale, parametric pdes with applications in heterogeneous media\.Computer Methods in Applied Mechanics and Engineering432,pp\. 117342\.External Links:ISSN 0045\-7825,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.cma.2024.117342),[Link](https://www.sciencedirect.com/science/article/pii/S0045782524005978)Cited by:[§2\.2](https://arxiv.org/html/2606.06827#S2.SS2.p3.1)\.
- W\. Chen, Y\. Fu, M\. Penwarden, P\. Roy, and P\. Stinis \(2026\)ArGEnT: arbitrary geometry\-encoded transformer for operator learning\.External Links:2602\.11626,[Link](https://arxiv.org/abs/2602.11626)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p4.1)\.
- Y\. Chen and S\. Koohy \(2024\)GPT\-pinn: generative pre\-trained physics\-informed neural networks toward non\-intrusive meta\-learning of parametric pdes\.Finite Elem\. Anal\. Des\.228\(C\)\.External Links:ISSN 0168\-874X,[Link](https://doi.org/10.1016/j.finel.2023.104047),[Document](https://dx.doi.org/10.1016/j.finel.2023.104047)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p2.1)\.
- W\. Cho, M\. Jo, H\. Lim, K\. Lee, D\. Lee, S\. Hong, and N\. Park \(2024\)Parameterized physics\-informed neural networks for parameterized PDEs\.InProceedings of the 41st International Conference on Machine Learning,pp\. 8510–8533\.External Links:[Link](https://openreview.net/forum?id=n3yYrtt9U7)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p2.1)\.
- D\. Y\. Eng \(2026\)Geometric priors for single\-patch neural Calabi–Yau metrics on the quintic: a conservative fixed\-psi ablation study\.Machine Learning: Science and Technology\.External Links:[Document](https://dx.doi.org/10.1088/2632-2153/ae7494),[Link](https://iopscience.iop.org/article/10.1088/2632-2153/ae7494)Cited by:[§2\.2](https://arxiv.org/html/2606.06827#S2.SS2.p1.1)\.
- A\. Essakine, Y\. Cheng, C\. Cheng, L\. Zhang, Z\. Deng, L\. Zhu, C\. Schönlieb, and A\. I\. Aviles\-Rivero \(2025\)Where do we stand with implicit neural representations? a technical and performance survey\.Transactions on Machine Learning Research\.Note:Survey CertificationExternal Links:ISSN 2835\-8856,[Link](https://openreview.net/forum?id=QTsJXSvAI2)Cited by:[§1](https://arxiv.org/html/2606.06827#S1.p1.1)\.
- P\. Foret, A\. Kleiner, H\. Mobahi, and B\. Neyshabur \(2021\)Sharpness\-aware minimization for efficiently improving generalization\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=6Tm1mposlrM)Cited by:[§4\.1](https://arxiv.org/html/2606.06827#S4.SS1.p2.11)\.
- S\. Ghosh, A\. Chakraborty, G\. O\. Brikis, and B\. Dey \(2023\)RANS\-PINN based simulation surrogates for predicting turbulent flows\.In1st Workshop on the Synergy of Scientific and Machine Learning Modeling @ ICML2023,External Links:[Link](https://openreview.net/forum?id=O61HY1bZdb)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p2.1)\.
- A\. Jangir, R\. Clements, R\. Goyal, and G\. Tabor \(2026\)Sparse\-supervised hybrid parameterized physics\-informed neural networks for incompressible flows across reynolds numbers\.External Links:2602\.04670,[Link](https://arxiv.org/abs/2602.04670)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p2.1)\.
- M\. Kim, T\. Wen, K\. Lee, and Y\. Choi \(2024\)Physics\-informed reduced order model with conditional neural fields\.arXiv preprint arXiv:2412\.05233\.Cited by:[§2\.2](https://arxiv.org/html/2606.06827#S2.SS2.p3.1)\.
- N\. Kovachki, Z\. Li, B\. Liu, K\. Azizzadenesheli, K\. Bhattacharya, A\. Stuart, and A\. Anandkumar \(2023\)Neural operator: learning maps between function spaces with applications to pdes\.Journal of Machine Learning Research24\(89\),pp\. 1–97\.External Links:[Link](http://jmlr.org/papers/v24/21-1524.html)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p4.1)\.
- J\. Lee, J\. Tack, N\. Lee, and J\. Shin \(2021\)Meta\-learning sparse implicit neural representations\.InAdvances in Neural Information Processing Systems,pp\. 11769–11780\.External Links:[Link](https://proceedings.neurips.cc/paper/2021/hash/61b1fb3f59e28c67f3925f3c79be81a1-Abstract.html)Cited by:[§2\.2](https://arxiv.org/html/2606.06827#S2.SS2.p3.1)\.
- S\. Li, F\. Peng, and A\. Ghaderi \(2025\)Modular fine\-tuning of physics\-informed neural networks for natural convection in eccentric geometries\.Physics of Fluids37\(7\)\.Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p3.1)\.
- Z\. Li, D\. Z\. Huang, B\. Liu, and A\. Anandkumar \(2023\)Fourier neural operator with learned deformations for pdes on general geometries\.Journal of Machine Learning Research24\(388\),pp\. 1–26\.External Links:[Link](http://jmlr.org/papers/v24/23-0064.html)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p4.1)\.
- Z\. Li, N\. B\. Kovachki, K\. Azizzadenesheli, B\. Liu, K\. Bhattacharya, A\. Stuart, and A\. Anandkumar \(2021\)Fourier neural operator for parametric partial differential equations\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=c8P9NQVtmnO)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p4.1)\.
- Y\. Liu, W\. Liu, X\. Yan, S\. Guo, and C\. Zhang \(2023\)Adaptive transfer learning for pinn\.Journal of Computational Physics490,pp\. 112291\.External Links:ISSN 0021\-9991,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.jcp.2023.112291),[Link](https://www.sciencedirect.com/science/article/pii/S0021999123003868)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p3.1)\.
- Z\. Liu, H\. Zhu, Q\. Zhang, J\. Fu, W\. Deng, Z\. Ma, Y\. Guo, and X\. Cao \(2024\)FINER: flexible spectral\-bias tuning in implicit neural representation by variable\-periodic activation functions\.InProceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference \(CVPR\),pp\. 2713–2722\.Cited by:[§2\.2](https://arxiv.org/html/2606.06827#S2.SS2.p1.1)\.
- L\. Lu, P\. Jin, G\. Pang, Z\. Zhang, and G\. E\. Karniadakis \(2021\)Learning nonlinear operators via deeponet based on the universal approximation theorem of operators\.Nature Machine Intelligence3\(3\),pp\. 218–229\.Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p4.1)\.
- S\. T\. S\. Luo \(2025\)A new perspective to understanding multi\-resolution hash encoding for neural fields\.External Links:2505\.03042,[Link](https://arxiv.org/abs/2505.03042)Cited by:[§2\.2](https://arxiv.org/html/2606.06827#S2.SS2.p3.1)\.
- T\. Müller, A\. Evans, C\. Schied, and A\. Keller \(2022\)Instant neural graphics primitives with a multiresolution hash encoding\.ACM Trans\. Graph\.41\(4\),pp\. 102:1–102:15\.External Links:[Link](https://doi.org/10.1145/3528223.3530127),[Document](https://dx.doi.org/10.1145/3528223.3530127)Cited by:[§1](https://arxiv.org/html/2606.06827#S1.p1.1),[§2\.2](https://arxiv.org/html/2606.06827#S2.SS2.p3.1)\.
- R\. Najian Asl, Y\. Yamazaki, K\. Taghikhani, M\. Muramatsu, M\. Apel, and S\. Rezaei \(2026\)A physics\-informed meta\-learning framework for the continuous solution of parametric pdes on arbitrary geometries\.Computers & Structures322,pp\. 108102\.External Links:ISSN 0045\-7949,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.compstruc.2026.108102),[Link](https://www.sciencedirect.com/science/article/pii/S0045794926000064)Cited by:[§2\.2](https://arxiv.org/html/2606.06827#S2.SS2.p3.1)\.
- J\. Oldenburg, F\. Borowski, A\. Öner, K\. Schmitz, and M\. Stiehm \(2022\)Geometry aware physics informed neural network surrogate for solving navier–stokes equation \(gapinn\)\.Advanced Modeling and Simulation in Engineering Sciences9\(8\)\.External Links:[Document](https://dx.doi.org/10.1186/s40323-022-00221-z)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p3.1)\.
- S\. Pan, S\. L\. Brunton, and J\. N\. Kutz \(2023\)Neural implicit flow: a mesh\-agnostic dimensionality reduction paradigm of spatio\-temporal data\.J\. Mach\. Learn\. Res\.24\(1\)\.External Links:ISSN 1532\-4435Cited by:[§2\.2](https://arxiv.org/html/2606.06827#S2.SS2.p3.1)\.
- S\. J\. Pan and Q\. Yang \(2010\)A survey on transfer learning\.IEEE Transactions on Knowledge and Data Engineering22\(10\),pp\. 1345–1359\.External Links:[Document](https://dx.doi.org/10.1109/TKDE.2009.191),[Link](https://doi.org/10.1109/TKDE.2009.191)Cited by:[§1](https://arxiv.org/html/2606.06827#S1.p2.1)\.
- A\. Quarteroni, A\. Manzoni, and F\. Negri \(2015\)Reduced basis methods for partial differential equations: an introduction\.Springer,Cham\.External Links:[Document](https://dx.doi.org/10.1007/978-3-319-15431-2)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p1.1)\.
- N\. Roy \(2025\)Transfer learning based physics–informed neural networks to solve the allen–cahn equation on curved surfaces\.Physica Scripta100\(8\),pp\. 086008\.External Links:[Document](https://dx.doi.org/10.1088/1402-4896/adea1e),[Link](https://doi.org/10.1088/1402-4896/adea1e)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p3.1)\.
- L\. Shen, L\. Deng, C\. Bi, Y\. Wang, X\. Chen, Y\. Wang, and J\. Liu \(2025\)PEINR: a physics\-enhanced implicit neural representation for high\-fidelity flow field reconstruction\.InForty\-second International Conference on Machine Learning,External Links:[Link](https://openreview.net/forum?id=1QZMWVrgsU)Cited by:[§2\.2](https://arxiv.org/html/2606.06827#S2.SS2.p3.1)\.
- V\. Sitzmann, J\. N\.P\. Martel, A\. W\. Bergman, D\. B\. Lindell, and G\. Wetzstein \(2020\)Implicit neural representations with periodic activation functions\.InAdvances in Neural Information Processing Systems,Cited by:[§1](https://arxiv.org/html/2606.06827#S1.p1.1),[§2\.2](https://arxiv.org/html/2606.06827#S2.SS2.p1.1),[1st item](https://arxiv.org/html/2606.06827#S3.I1.i1.p1.2.1)\.
- M\. Tancik, P\. P\. Srinivasan, B\. Mildenhall, S\. Fridovich\-Keil, N\. Raghavan, U\. Singhal, R\. Ramamoorthi, J\. T\. Barron, and R\. Ng \(2020\)Fourier features let networks learn high frequency functions in low dimensional domains\.InProceedings of the 34th International Conference on Neural Information Processing Systems,NIPS ’20,Red Hook, NY, USA\.External Links:ISBN 9781713829546Cited by:[§1](https://arxiv.org/html/2606.06827#S1.p1.1),[§2\.2](https://arxiv.org/html/2606.06827#S2.SS2.p2.1),[3rd item](https://arxiv.org/html/2606.06827#S3.I1.i3.p1.1.1)\.
- A\. Tran, A\. P\. Mathews, L\. Xie, and C\. S\. Ong \(2021\)Factorized fourier neural operators\.CoRRabs/2111\.13802\.External Links:[Link](https://arxiv.org/abs/2111.13802)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p4.1)\.
- S\. Wang, H\. Wang, and P\. Perdikaris \(2021\)Learning the solution operator of parametric partial differential equations with physics\-informed deeponets\.Science Advances7\(40\),pp\. eabi8605\.External Links:[Document](https://dx.doi.org/10.1126/sciadv.abi8605),[Link](https://www.science.org/doi/abs/10.1126/sciadv.abi8605),https://www\.science\.org/doi/pdf/10\.1126/sciadv\.abi8605Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p4.1)\.
- Y\. Wang, Y\. Gong, and Y\. Zeng \(2024\)Hyb\-nerf: a multiresolution hybrid encoding for neural radiance fields\.InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision \(WACV\),pp\. 3689–3698\.Cited by:[§2\.2](https://arxiv.org/html/2606.06827#S2.SS2.p3.1)\.
- Y\. Wang, J\. Bai, M\. S\. Eshaghi, C\. Anitescu, X\. Zhuang, T\. Rabczuk, and Y\. Liu \(2025\)Transfer learning in physics\-informed neural networks: full fine\-tuning, lightweight fine\-tuning, and low\-rank adaptation\.International Journal of Mechanical System Dynamics5\(2\),pp\. 212–235\.External Links:[Document](https://dx.doi.org/10.1002/msd2.70030),[Link](https://onlinelibrary.wiley.com/doi/10.1002/msd2.70030)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p3.1)\.
- Y\. Yin, M\. Kirchmeyer, J\. Franceschi, A\. Rakotomamonjy, and P\. Gallinari \(2023\)Continuous PDE dynamics forecasting with implicit neural representations\.InThe Eleventh International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=B73niNjbPs)Cited by:[§2\.2](https://arxiv.org/html/2606.06827#S2.SS2.p3.1)\.
- Z\. Zhang, X\. Xiong, S\. Zhang, W\. Wang, X\. Yang, S\. Zhang, and C\. Yang \(2025\)A pseudo\-time stepping and parameterized physics\-informed neural network framework for navier–stokes equations\.Physics of Fluids37\(3\)\.External Links:[Document](https://dx.doi.org/10.1063/5.0259583)Cited by:[§2\.1](https://arxiv.org/html/2606.06827#S2.SS1.p2.1)\.Similar Articles
Do Hallucination Neurons Generalize? Evidence from Cross-Domain Transfer in LLMs
Independent researchers show that sparse "hallucination neurons" identified in LLMs do not transfer across domains, dropping from 0.783 to 0.563 AUROC, indicating hallucination is domain-specific rather than a universal neural signature.
Bug or Feature^2: Weight Drift, Activation Sparsity, and Spikes
This paper formally proves that training neural networks with asymmetric activation functions like ReLU, GELU, or SiLU causes weights to drift negative, leading to up to 90% activation sparsity. It also shows that squared activations like ReLU² improve performance but cause activation spikes, which can be fixed by clipping, with GELU² achieving the best validation loss.
Generalized Neurons
The article explores the Universal Approximation Theorem in deep learning, analyzing the representation capacity of individual neurons and neural network layers using ReLU activation functions.
Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't
This theoretical paper analyzes the expressivity of padded transformers, showing that attention type, width, and uniformity have little impact compared to numeric precision and model depth. It establishes equivalences between transformer variants and circuit complexity classes like AC0 and TC0, providing a robust characterization.
Architecture, Not Scale: Circuit Localization in Large Language Models
This paper challenges the assumption that mechanistic interpretability becomes harder as models scale, showing that architecture (specifically Grouped Query Attention vs. Multi-Head Attention) matters more than parameter count for circuit localization and stability.