A fully GPU-based workflow for building physics emulators of hypersonic flows

arXiv cs.LG 06/15/26, 04:00 AM Papers
Summary
This paper introduces a fully GPU-based workflow that accelerates data generation and training of neural emulators for hypersonic flows, using a differentiable solver (JAX-Fluids) and residual-based refinement to improve physical consistency and reliability beyond training distribution.
arXiv:2606.13742v1 Announce Type: new Abstract: The ability to resolve complex physical phenomena with high fidelity and at low computational cost is central to addressing key challenges in modern engineering. A prime example lies in hypersonic flows, where the precise prediction of the full flowfield topology, in particular with respect to shock wave location and intensity, is critical. Yet supersonic and hypersonic flows continue to be a stumbling block for traditional reduced-order models and neural emulators that struggle to capture steep gradients in flow states with physical consistency in applications of industrial relevance. To that end, we introduce a fully GPU based workflow that integrates accelerated data generation with the training of neural emulators augmented by uncertainty quantification and physics-aware refinement. Our workflow is enabled by a differentiable high-fidelity solver (JAX-Fluids) which we employ for rapid dataset creation and residual-based improvement of the neural emulator to enhance physical consistency. Building on this framework, we first present a suite of model architectures and analyze their scaling behavior to expose their strengths and shortcomings. We then show that residual-based refinement enables training on cases where only mesh and input parameters are available, substantially reducing residuals and improving physical consistency. Together, differentiable simulation and residual-based refinement yield physics emulators that remain reliable beyond their training distribution, a key requirement for deploying surrogates in real-world engineering design loops.
Original Article
View Cached Full Text
Cached at: 06/15/26, 09:07 AM
# A fully GPU-based workflow for building physics emulators of hypersonic flows
Source: [https://arxiv.org/html/2606.13742](https://arxiv.org/html/2606.13742)
Fabian Paischer2,3\\\>\\\>\{\}^\{2,3\}Dylan Rubini11footnotemark:13\\\>\\\>\{\}^\{3\}Deniz A\. Bezgin1Aaron B\. Buhendwa1 David Hauser3Florian Sestak2,3Johannes Brandstetter2,3Sebastian Kaltenbach3 Nikolaus A\. Adams1 1Chair of Aerodynamics and Fluid Mechanics, TU Munich, Germany 2ELLIS Unit, Institute for Machine Learning, JKU Linz 3EMMI AI, Linz

###### Abstract

The ability to resolve complex physical phenomena with high fidelity and at low computational cost is central to addressing key challenges in modern engineering\. A prime example lies in hypersonic flows, where the precise prediction of the full flowfield topology, in particular with respect to shock wave location and intensity, is critical\. Yet supersonic and hypersonic flows continue to be a stumbling block for traditional reduced\-order models and neural emulators that struggle to capture steep gradients in flow states with physical consistency in applications of industrial relevance\. To that end, we introduce a fully GPU based workflow that integrates accelerated data generation with the training of neural emulators augmented by uncertainty quantification and physics\-aware refinement\. Our workflow is enabled by a differentiable high\-fidelity solver \(JAX\-Fluids\) which we employ for rapid dataset creation and residual\-based improvement of the neural emulator to enhance physical consistency\. Building on this framework, we first present a suite of model architectures and analyze their scaling behavior to expose their strengths and shortcomings\. We then show that residual\-based refinement enables training on cases where only mesh and input parameters are available, substantially reducing residuals and improving physical consistency\. Together, differentiable simulation and residual\-based refinement yield physics emulators that remain reliable beyond their training distribution, a key requirement for deploying surrogates in real\-world engineering design loops\.

## 1Introduction

Fluid dynamics is fundamental to many processes in nature and technology, and its numerical simulation routinely ranks among the top\-level compute\-resource allocations at Tier 0 computing facilitiesSlotnicket al\.\([2014](https://arxiv.org/html/2606.13742#bib.bib47)\)\. Despite its complex and inherently multi\-scale character, flow fields exhibit coherent behavior characterized by universal dependencies\. A long\-standing challenge for predictive simulation, particularly in high\-speed transport and propulsion, is the simultaneous presence of multi\-scale flow structures such as coherent vortices, eddies, and shocks\. Shockwaves are among the most consequential of these phenomena, governing processes from the evolution of galaxiesMcKee and Hollenbach \([1980](https://arxiv.org/html/2606.13742#bib.bib3)\)to the feasibility of high\-speed propulsionUrzay \([2018](https://arxiv.org/html/2606.13742#bib.bib2)\)\. They are characterized by discontinuities in the macroscopic flow state, such as extreme gradients in pressure, density, temperature, and momentum, which arise when the local flow speed‖𝐮\(𝐱,t\)‖\\\|\\mathbf\{u\}\(\\mathbf\{x\},t\)\\\|exceeds the local speed of sounda\(𝐱,t\)a\(\\mathbf\{x\},t\), i\.e\., when the local Mach numberMa\(𝐱,t\)=‖𝐮‖/a≥1Ma\(\\mathbf\{x\},t\)=\\\|\\mathbf\{u\}\\\|/a\\geq 1\.

Hypersonic flowfield predictions are particularly challenging for both numerical simulation and data\-driven modeling\. They are characterized by Mach numbers beyond approximately five and exhibit strong shock interactions, high\-enthalpy effects, and stringent conservation requirements\. Such flow problems have historically served as driving applications for high\-performance computing, motivating both methodological advances and community benchmarksWilfonget al\.\([2025](https://arxiv.org/html/2606.13742#bib.bib6)\); Rossinelliet al\.\([2013](https://arxiv.org/html/2606.13742#bib.bib7)\)\. The computational cost of numerical simulations has made data\-driven alternatives to classical computational fluid dynamics \(CFD\) a foremost research interestBruntonet al\.\([2020](https://arxiv.org/html/2606.13742#bib.bib8)\); Brenneret al\.\([2019](https://arxiv.org/html/2606.13742#bib.bib32)\); Karniadakiset al\.\([2021](https://arxiv.org/html/2606.13742#bib.bib43)\)\. These range from full surrogate substitution of the numerical solver via Physics Emulators \(PEs\) to machine\-learned \(ML\) acceleration of existing CFD methodologies\.

In this work, we present a fully GPU\-based workflow for hypersonic flows enabled by the differentiable finite\-volume solver JAX\-Fluids\(Bezginet al\.,[2023](https://arxiv.org/html/2606.13742#bib.bib12),[2025a](https://arxiv.org/html/2606.13742#bib.bib11)\)\. The proposed workflow encompasses\(i\)GPU\-accelerated data generation,\(ii\)pre\-training of neural emulators, and\(iii\)target\-free, residual\-based refinement\.Data generation is based on a Cartesian multi\-block mesh that enables efficient parallelization on GPU and seamless integration with various ML architectures and training paradigms\. For pre\-training, we investigate two complementary architectures, namely irregular\-grid based AB\-UPT\(Alkinet al\.,[2025](https://arxiv.org/html/2606.13742#bib.bib14)\)and regular\-grid based vision transformer\(ViT, Dosovitskiyet al\.,[2021](https://arxiv.org/html/2606.13742#bib.bib15)\)\. Furthermore, we investigate the trade\-off between deterministic and probabilistic training paradigms\. We conduct scaling studies with respect to both model size and dataset size for all architectures and training paradigms\. For target\-free residual\-based refinement the pre\-trained PE generates a candidate solution which is evaluated against the residual of the underlying partial differential equation \(PDE\) computed by the differentiable solver\. The resulting gradient signal is backpropagated into the emulator weights without requiring any target flow fields, mirroring exactly the numerical discretization used during data generation\.

Our experiments reveal several key findings for architecture and training paradigm selection and refinement for PEs in the hypersonic regime\. Among the two architectures, AB\-UPT achieves the highest accuracy in data\-abundant settings, while the ViT outperforms in data\-scarce regimes due to the strong inductive bias provided by its regular\-grid structure\. Flow matching trades point\-wise accuracy for generative modeling capability, but provides off\-the\-shelf uncertainty and acts as implicit data augmentation, yielding a smaller gap between in\-distribution and out\-of\-distribution performance than either deterministic architecture\. For physics\-aware refinement, we find that backpropagation through the PDE residual leads to substantial reductions in conservation residuals with little to no changes in field\-level accuracy, suggesting that the pre\-trained models already capture the dominant flow structure and the refinement primarily corrects local physical consistency\. Notably, the target\-free setup, which conditions on the computational mesh and input parameters without requiring reference flow fields exhibits the largest improvement in residuals\.

Overall our contributions are as follows\.

- •We present a fully GPU\-based workflow for hypersonic flow emulation that integrates data generation, surrogate pre\-training, and physics\-aware refinement within a single differentiable pipeline built on the JAX\-Fluids solver\.
- •We enable regular\-grid ML architectures to be agnostic to grid topology parameters \(e\.g\., block count and ordering of the block\-structured meshes\) by combining absolute and relative positional encodings based on coordinates in physical space\.
- •We evaluate two complementary neural architectures \(AB\-UPT and ViT\) and two training paradigms \(deterministic vs probabilistic\) and conduct scaling studies with respect to both model size and dataset size, identifying distinct data\-efficiency and accuracy trade\-offs across regimes\.
- •We introduce a target\-free refinement stage that improves physical consistency by backpropagating PDE residuals into the pre\-trained neural PE weights without requiring reference flow fields, and demonstrate its advantage over field\-value fine\-tuning\.

## 2Related Work

Machine learning has been integrated into CFD workflows in several complementary ways\. One line of work augments classical numerical schemes with learned components\. Neural networks \(NNs\) have served as troubled\-cell indicators that locate where limiting is needed in high\-order discretizationsRay and Hesthaven \([2018](https://arxiv.org/html/2606.13742#bib.bib34)\), as local and parsimonious modifications within physics\-constrained implicit Large\-Eddy Simulation \(LES\) that lead to modifications of classical shock\-capturing schemesBezginet al\.\([2025b](https://arxiv.org/html/2606.13742#bib.bib10)\), and as learned correction operators that recover fine\-grid accuracy from coarse\-grid solversKochkovet al\.\([2021](https://arxiv.org/html/2606.13742#bib.bib31)\)\. A related line uses reinforcement learning to discover effective closuresNovatiet al\.\([2021](https://arxiv.org/html/2606.13742#bib.bib46)\); Fischeret al\.\([2025](https://arxiv.org/html/2606.13742#bib.bib45)\)\. A third direction bypasses the solver entirely and trains neural networks as end\-to\-end PEs, e\.g\., deep convolutional models that map airfoil geometry directly to Reynolds\-averaged Navier–Stokes \(RANS\) fieldsThuereyet al\.\([2020](https://arxiv.org/html/2606.13742#bib.bib33)\)\. A common enabler across all of these settings is the availability of a state\-of\-the\-art CFD solver with algorithmic differentiation capability for end\-to\-end pipelinesBezginet al\.\([2023](https://arxiv.org/html/2606.13742#bib.bib12),[2025a](https://arxiv.org/html/2606.13742#bib.bib11)\)\.

A more general formulation seeks to learn the parameter\-to\-solution operator of a PDE as a mapping between function spaces\. Neural operators provide a scalable and resolution\-invariant framework for learning mappings and offer orders\-of\-magnitude speedups over traditional numerical solversLiet al\.\([2020](https://arxiv.org/html/2606.13742#bib.bib38)\); Luet al\.\([2021](https://arxiv.org/html/2606.13742#bib.bib39)\); Azizzadenesheliet al\.\([2024](https://arxiv.org/html/2606.13742#bib.bib18)\)\. Additionally, for unstructured discretizations, mesh\-based graph networks exploit the adjacency structure of the simulation grid to learn local update rulesPfaffet al\.\([2020](https://arxiv.org/html/2606.13742#bib.bib40)\)\. Recently, purely transformer\-based\(Vaswaniet al\.,[2017](https://arxiv.org/html/2606.13742#bib.bib23)\)formulationsAlkinet al\.\([2024](https://arxiv.org/html/2606.13742#bib.bib37),[2025](https://arxiv.org/html/2606.13742#bib.bib14)\)have shown to successfully scale to industry relevant complexity and effectively capture long range dependencies\. Despite their flexibility, neural PEs still face well\-documented practical limitations, for example heavy data requirements, sensitivity to training\-distribution shifts, lack of rollout robustness, and the absence of guaranteed physical consistency in complex regimesVinuesa and Brunton \([2022](https://arxiv.org/html/2606.13742#bib.bib17)\)\. Most of the current work therefore has focused on optimizing model errors, for instance RANS equationsGupta and Duraisamy \([2026](https://arxiv.org/html/2606.13742#bib.bib19)\)\. Questions of scalability and applicability beyond such regimes, including hypersonic flight, remain open\.

Although small\-scale fluctuations in velocity, pressure, and density can in principle be resolved on sufficiently fine grids, shocks remain genuine discontinuities and require dedicated nonlinear schemesF\. \([2009](https://arxiv.org/html/2606.13742#bib.bib5)\)\. Shocks are weak solutions of the underlying flow equations and obey precise jump relations between pre\- and post\-shock states, namely the Rankine–Hugoniot conditionsLeVeque \([1992](https://arxiv.org/html/2606.13742#bib.bib13)\)\. These properties make reliable, high\-resolution prediction of shocks and shock interactions challenging even for classical schemes and notoriously difficult for purely data\-driven surrogates\. Only with the explicit inclusion of inductive biases for physical consistency have physics\-informed neural networks\(Karniadakiset al\.,[2021](https://arxiv.org/html/2606.13742#bib.bib43)\)delivered predictions with correct shock locations, satisfied jump conditions, and maintained positivity of the flow stateMaoet al\.\([2020](https://arxiv.org/html/2606.13742#bib.bib35)\); Jagtapet al\.\([2022](https://arxiv.org/html/2606.13742#bib.bib4)\)\. The optimization of a discrete loss provided by a numerical discretization of the governing equations on a chosen mesh has more recently proven sufficiently effective to address inverse inference problems in three\-dimensional steady\-state transonic and supersonic flowsBuhendwaet al\.\([2025](https://arxiv.org/html/2606.13742#bib.bib20)\); Paischeret al\.\([2025](https://arxiv.org/html/2606.13742#bib.bib62)\), although in that setting some flow\-field data must still be available\.

For generative tasks, denoising diffusion models have begun to be explored in fluid mechanics\. They produce sample\-diverse predictions and naturally provide a posterior from which uncertainty can be estimatedHoet al\.\([2020](https://arxiv.org/html/2606.13742#bib.bib16)\)\. Diffusion models enable fast forecasting of distributional quantities of interest in high\-dimensional dynamical systemsGaoet al\.\([2024](https://arxiv.org/html/2606.13742#bib.bib41)\); Molinaroet al\.\([2024](https://arxiv.org/html/2606.13742#bib.bib42)\)\. In incompressible turbulence, diffusion models have generated physically plausible three\-dimensional flow states from scratchLienenet al\.\([2024](https://arxiv.org/html/2606.13742#bib.bib36)\)and delivered calibrated uncertainty for airfoil flows over a range of Reynolds numbers and angles of attackLiu and Thuerey \([2024](https://arxiv.org/html/2606.13742#bib.bib44)\)\. In the compressible regime, denoising diffusion models have been examined for moderately supersonic flowAbaidi and Adams \([2025](https://arxiv.org/html/2606.13742#bib.bib21)\)\. However, a fully GPU\-based workflow that combines parameterized data generation, neural emulator training across complementary architectures, uncertainty quantification, and physics\-aware refinement for complex hypersonic flows has not yet been demonstrated\.

## 3The Neural Physics Emulator Pipeline

End\-to\-End Differentiable Workflow for Building Physics Emulators\(a\) High\-FidelityData Generation\(b\) Data\-DrivenPre\-Training\(c\) Physics\-InformedFine\-Tuning![Refer to caption](https://arxiv.org/html/2606.13742v1/x1.png)STL File from CAD![Refer to caption](https://arxiv.org/html/2606.13742v1/x2.png)Multiblock Mesh![Refer to caption](https://arxiv.org/html/2606.13742v1/figures/workflow_schematic/jax_logo.png)JAX\-FluidsCFD solver![Refer to caption](https://arxiv.org/html/2606.13742v1/x3.png)Flow FieldTraining Data![Refer to caption](https://arxiv.org/html/2606.13742v1/figures/workflow_schematic/transformer_icon.png)TransformerAB\-UPT![Refer to caption](https://arxiv.org/html/2606.13742v1/x4.png)PredictionSupervised Lossℒdata\\mathcal\{L\}\_\{\\mathrm\{data\}\}![Refer to caption](https://arxiv.org/html/2606.13742v1/figures/workflow_schematic/transformer_icon.png)TransformerPre\-trained![Refer to caption](https://arxiv.org/html/2606.13742v1/x5.png)Prediction![Refer to caption](https://arxiv.org/html/2606.13742v1/figures/workflow_schematic/jax_logo.png)JAX\-FluidsDifferentiable operatorPDE ResidualℒPDE\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}Forward pass / information flowBackward pass / gradientsBackward pass / physics\-informed gradientsGPU\-accelerated∙\\bulletEnd\-to\-end differentiable∙\\bulletPhysics\-guided learning

Figure 1:A fully GPU\-accelerated and end\-to\-end differentiable workflow for constructing physics emulators of complex flow phenomena\. The workflow consists of three stages\. \(a\) High\-Fidelity Data Generation: Starting from an STL representation of the geometry, a multi\-block mesh is automatically generated\. JAX\-Fluids then performs high\-fidelity CFD simulations until a steady\-state solution is obtained, yielding the training dataset\. \(b\) Data\-Driven Pre\-Training: Physics emulators are pre\-trained in a supervised manner using the high\-fidelity dataset generated in stage \(a\)\. \(c\) Physics\-Informed Fine\-Tuning: After pre\-training, the emulators are refined in a target\-free manner by minimizing the residuals of the governing equations\. The residuals are computed by evaluating the differentiable JAX\-Fluids solver on the model predictions, enabling end\-to\-end gradient\-based optimization through the model and the CFD solver itself\.Drawing inspiration from the success of Large Language Models \(LLMs\) in natural language processingAchiamet al\.\([2023](https://arxiv.org/html/2606.13742#bib.bib24)\); Teamet al\.\([2023](https://arxiv.org/html/2606.13742#bib.bib22)\), we propose a workflow for building PEs that ranges from data generation to pre\-training and fine\-tuning\. While LLMs excel at modeling linguistic structures, a PE is specifically designed to learn the complex functional relationships of a physical system\. Despite their differences, both LLMs and PEs can be based on the same underlying attention mechanismVaswaniet al\.\([2017](https://arxiv.org/html/2606.13742#bib.bib23)\)\.

We deliberately choose the termPhysics Emulatorto distinguish this work from generic black\-box regression models\. The termPhysicssignals that the systems of interest are physical and not just arbitrary input\-output mappings\. The termEmulatoris adopted from the statistical literature on computer experiments, where it denotes a fast, probabilistic surrogate trained on the outputs of a computationally expensive simulatorSackset al\.\([1989](https://arxiv.org/html/2606.13742#bib.bib25)\); Kennedy and O’Hagan \([2001](https://arxiv.org/html/2606.13742#bib.bib26)\)\. In that tradition, an emulator is not merely a curve fit, it is a high\-fidelity statistical proxy designed to reproduce the full input\-output behavior of the underlying code\. Consequently, a Physics Emulator is such a surrogate purpose\-built for physical simulators, functioning as a drop\-in replacement for solvers such as CFD, mapping input parameters to complete physical fields in milliseconds rather than hours\.

To effectively serve as a neural surrogate for hypersonic applications, we identify the following desirable characteristics for a PE, the first two of which we consider essential:

- •Differentiability: The model must support end\-to\-end automatic differentiation for seamless integration into gradient\-based design optimization and for physics\-based fine\-tuning using differentiable CFD\.
- •Physical consistency: Predictions must respect the conservation laws of mass, momentum, and energy as much as possible\.
- •Uncertainty capabilities:When predictive uncertainty is relevant for the downstream task, the PE should be able to provide a posterior distribution that can be sampled to yield uncertainty estimates, rather than only point predictions\.

To construct a Physics Emulator that satisfies these requirements, we propose a fully GPU\-resident pipeline comprising three phases, see[Figure˜1](https://arxiv.org/html/2606.13742#S3.F1):

1. 1\.Data Generation:First, data generation requires defining a robust design space\. This involves parameterizing the geometries, material properties and boundary conditions of the target engineering system\. By establishing a comprehensive parametric envelope, we ensure the resulting model will be exposed to a diverse and representative set of physical scenarios\. Based on the parameterized setup, high\-fidelity data must be generated, which requires a scalable and accurate solver\.
2. 2\.Model Training:Given a dataset of high\-fidelity simulations, we pre\-train a neural PE\. This step includes the selection of a suitable model architecture and training process\. For instance, in case predictive uncertainty should be quantified a probabilistic framework is required\.
3. 3\.Model Fine\-tuning:The base model can be fine\-tuned based on specific quantities of interest \(e\.g\., conservation laws\)\. These quantities can be chosen after pre\-training and can be done in a target\-free manner\.

The following sections discuss all three phases in detail:

### 3\.1Data Generation: JAX\-Fluids

Within the scope of this work, high\-fidelity flow\-field data are generated using JAX\-FluidsBezginet al\.\([2023](https://arxiv.org/html/2606.13742#bib.bib12),[2025a](https://arxiv.org/html/2606.13742#bib.bib11)\), a high\-order, fully differentiable finite\-volume solver for compressible single\- and two\-phase flows\. JAX\-Fluids combines high\-order shock\-capturing discretizations, GPU\-acceleration, automated Cartesian multi\-block meshing, and end\-to\-end automatic differentiation within a single JAX\-based framework, making it particularly well suited for training of PEs\. In particular, these properties enable using the solver not only as an offline data generator, but also as a differentiable physics engine during model training and downstream optimization\.

Hypersonic flows are characterized by complex flow phenomena like shock\-shock interactions, shock\-interface interactions, wave dynamics, viscous\-inviscid interactions, flow separation, and multi\-species effects\. Accurately resolving these phenomena requires numerical methods that are both robust in the presence of discontinuities and sufficiently accurate in smooth regions of the flow\. In this work, the data are generated in the inviscid limit governed by the compressible Euler equations which corresponds to typical application scenarios of high Reynolds numbers\. At hypersonic Mach numbers considered here, the intake flowfield and the integral performance metrics are dominated by the shock structure which the Euler equations admit as weak solutions satisfying Rankine–Hugoniot jump conditions\. For hypersonic intakes operating at flight Reynolds numbers, viscous regions are confined to relatively thin boundary layers and inviscid analysis is the standard scope for preliminary scramjet\-inlet designHeiser and Pratt \([1994](https://arxiv.org/html/2606.13742#bib.bib1)\)\. The PE and refinement methodology described in subsequent section is not specific to the Euler equations and extends to the full Navier–Stokes system by adding viscous effects\.

JAX\-Fluids follows a high\-order Godunov\-type finite\-volume formulation\. Shock waves are captured using nonlinear solution\-adaptive reconstruction together with approximate Riemann solvers\. For the cases considered in this work, steady\-state solutions are obtained by explicit time advancement until the residuals of the governing equations fall below a prescribed tolerance\. The design of JAX\-Fluids is motivated by the requirements of ML workflows for computational physics\. In particular, three aspects are central to the present work: Cartesian multi\-block meshes, GPU acceleration, and automatic differentiability\.

##### Cartesian multi\-block mesh\.

A central requirement for large\-scale training\-data generation is a mesh\-generation procedure that is robust, automated, and computationally efficient\. In general, CFD mesh strategies can be classified as follows: structured versus unstructured meshes, and body\-fitted versus immersed boundary methods\. Unstructured body\-fitted meshes offer high geometric flexibility and are therefore well suited for complex configurations, but high\-order methods are difficult to implement efficiently due to irregular cell connectivity and thus indirect memory access\. Structured body\-fitted meshes, such as curvilinear grids, are more favorable for high\-order schemes because they retain regular cell connectivity, but complex geometries with fine geometric features are difficult to represent\.

In this work, we therefore employ a structured Cartesian multi\-block mesh combined with a conservative cut\-cell immersed\-boundary method\. The geometry is represented implicitly by a level\-set function, and cells intersected by the fluid\-solid interface are treated as cut cells\. This approach combines the geometric flexibility of immersed boundary methods with the numerical efficiency of structured Cartesian grids\. In particular, it enables automated meshing of fine geometric features while retaining the regular data layout and stencil structure required for efficient high\-order finite\-volume discretizations\. The multi\-block formulation enables local refinement through quadtree\- or octree\-type subdivision, allowing high resolution near solid boundaries and interfaces while keeping a coarser resolution elsewhere\.[Figure˜2](https://arxiv.org/html/2606.13742#S3.F2)shows the multi\-block grid of a representative geometry considered in this work\.

![Refer to caption](https://arxiv.org/html/2606.13742v1/x6.png)Figure 2:Schematic of the multi\-block grid for the generic scramjet demonstrator configuration\.The Cartesian multi\-block grid is particularly advantageous for GPU\-accelerated simulation\. The solution variables within each block are stored as dense multidimensional arrays, leading to regular memory access, efficient vectorization, and reduced indirect addressing compared with unstructured grids\. As a result, the dominant numerical kernels are well matched to modern accelerators, where performance depends strongly on data locality, coalesced memory access, and high arithmetic throughput\.

##### GPU\-acceleration\.

JAX\-Fluids is implemented in JAX and compiled through XLA for execution on modern accelerators\. Within each block, the solver operations are expressed as batched array operations, enabling large numbers of cells to be processed in parallel\. The multi\-block decomposition also provides a natural parallelization strategy: a set of individual blocks can be assigned to different XLA devices, while communication is limited to halo exchanges\. In JAX\-Fluids, this is enabled through JAX primitives such asjax\.shard\_mapandjax\.lax\.ppermute\. This GPU\-resident design is essential for generating large datasets over broad parameter spaces, including geometry and Mach number\. It is also important when JAX\-Fluids is used inside the training loop to evaluate physics\-based losses\.

##### Differentiable solver\.

JAX\-Fluids is a fully differentiable solver that allows calculation of gradients of objective functions by automatic differentiation\. These gradients are consistent with the discretized PDEs, i\.e\., they are consistent with the governing equation and the chosen numerical discretization\. Flow field predictions of a trained PE can be passed through JAX\-Fluids to fine\-tune it for achieving high\-fidelity, physically\-aware predictions in complex flows without inconsistencies with the data\-generating numerics\.

### 3\.2Neural Architectures for Physics Emulators

We investigate two neural architectures and two different modeling paradigms, each offering distinct trade\-offs for predicting hypersonic flowfields on octree\-based Euclidean mesh data\. The octree structure decomposes the domain into axis\-aligned blocks of uniform resolution, i\.e\., the count and ordering of blocks vary between cases\. This meshing strategy is increasingly adopted by modern GPU\-based solvers because it maps naturally to parallel hardware\(Jaberet al\.,[2026](https://arxiv.org/html/2606.13742#bib.bib64); Carreonet al\.,[2025](https://arxiv.org/html/2606.13742#bib.bib65)\), making scalable emulation of such grids a broadly relevant objective\. To enable regular\-grid architectures to be invariant to block count and ordering, we encode each patch’s physical coordinates using complementary absolute \(sinusoidal\) and relative \(rotary\) positional encodings\. Combined, they give a single global attention a complete picture of patch location and pairwise displacement across blocks\. In contrast, architectures based on point\-wise representations can simply ingest the raw point cloud\. This enables a range of neural architectures, and we consider the following instantiations of the PE:\(i\)a field\-based approach designed for irregular grids\(Alkinet al\.,[2025](https://arxiv.org/html/2606.13742#bib.bib14), AB\-UPT\),\(ii\)a regular\-grid\-based vision transformer\(Dosovitskiyet al\.,[2021](https://arxiv.org/html/2606.13742#bib.bib15), ViT\),\(iii\)a generative flow matching model operating on the regular\-grid representation\(Lipmanet al\.,[2022](https://arxiv.org/html/2606.13742#bib.bib56)\)\.

[Table˜1](https://arxiv.org/html/2606.13742#S3.T1)summarizes the qualitative trade\-offs among the three approaches\. AB\-UPT treats the mesh as an irregular point cloud and yields smooth predictions but does not have built\-in uncertainty estimates\. The ViT, by contrast, ingests the regular\-grid octree representation and benefits from the highly optimized attention kernels available for uniform tensor data\. The flow matching model is also based on a regular\-grid, but replaces the deterministic prediction head with a stochastic denoising process, trading single\-pass efficiency for predictive uncertainty estimates\.

Each model takes as input the simulation grid, either as a point cloud \(AB\-UPT\) or in the block\-stacked regular\-grid representation \(ViT, Flow Matching\), together with a conditioning vector comprising the 15 geometry parameters of the scramjet configuration and the free\-stream Mach numberM∞M\_\{\\infty\}\. The neural PE then predicts the corresponding flow field as a multi\-channel output with C=8 channels, where each channel represents a different physical quantity, namely pressurepp, densityρ\\rho, velocity𝐮\\mathbf\{u\}, temperatureTT, enthalpyhh, total pressurePtP\_\{t\}, kinetic energykk, and Mach numberMa\\operatorname\{Ma\}\. The model is trained by minimizing the mean squared error over all grid points,

ℒdata=1N∑i=1N‖𝐲^i−𝐲i‖2,\\mathcal\{L\}\_\{\\text\{data\}\}=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\left\\\|\\hat\{\\mathbf\{y\}\}\_\{i\}\-\\mathbf\{y\}\_\{i\}\\right\\\|^\{2\},\(1\)where𝐲i∈ℝC\\mathbf\{y\}\_\{i\}\\in\\mathbb\{R\}^\{C\}is the ground\-truth flow state at grid pointii,𝐲^i\\hat\{\\mathbf\{y\}\}\_\{i\}is the corresponding prediction, andNNis the number of grid points\. Predicting primitive \(i\.e\., density, velocity, and pressure\) and derived quantities \(i\.e\., temperature, enthalpy, total pressure, kinetic energy, and Mach number\) jointly avoids the error accumulation that arises when derived fields are reconstructed from predicted primitives \(see[Appendix˜F](https://arxiv.org/html/2606.13742#A6)\)\. The trade\-off is that the predicted derived quantities are not guaranteed to be consistent with those recomputed from the predicted primitives\. However, supervising only the primitives would leave the derived fields without a direct training signal and folding their computation into the loss introduces potentially ill\-conditioned gradients through the nonlinear derivations as well as additional weighting hyperparameters across heterogeneous scales\.

Table 1:Qualitative comparison of neural physics emulator instantiations\.✓= fully satisfied,✗= not satisfied\. Different methods and training paradigms exhibit different advantages\.
### 3\.3Physics\-aware Model Refinement

The third phase of our workflow refines the pre\-trained base model through physics\-aware optimization to facilitate physical consistency\. This stage exploits the end\-to\-end differentiability of the JAX\-Fluids solver to embed the governing flow equations directly within the ML optimization loop\. Whereas the initial training phase relies on a standard supervised loss the refinement phase minimizes the point\-wise discrete residuals of the compressible Euler equations

ℒPDE=∑\(i,j\)∈Ω∑k=14wk\(Ri,jk\)2ΔxΔy,\\mathcal\{L\}\_\{\\text\{PDE\}\}=\\sum\_\{\(i,j\)\\in\\Omega\}\\sum\_\{k=1\}^\{4\}w\_\{k\}\\left\(R^\{k\}\_\{i,j\}\\right\)^\{2\}\\Delta x\\Delta y,\(2\)whereRi,jkR^\{k\}\_\{i,j\}denotes the point\-wise residual for thekk\-th conserved quantity in cell\(i,j\)\(i,j\)\.wkw\_\{k\}is the corresponding loss weight\. Here, the conserved quantities comprise mass, x\-momentum, y\-momentum, and total energy\. For details we refer to[Section˜C\.4](https://arxiv.org/html/2606.13742#A3.SS4)\.

Using JAX\-Fluids, we evaluate the residual\-based lossℒPDE\\mathcal\{L\}\_\{\\text\{PDE\}\}on the simulation mesh with the same numerical discretizations employed during data generation, ensuring consistency between training and inference\. A key advantage of this approach is that it improves generalization and can be performed in a target\-free manner\. It requires only a differentiable numerical solver and simulation meshes for the configurations of interest, i\.e\., no additional ground\-truth solutions are needed\. The configuration space can therefore be cheaply expanded beyond that of the training data since the loss is computed directly from the flow equations\. This makes our model refinement a comparatively inexpensive fine\-tuning procedure that avoids the cost of running new high\-fidelity simulations\.

## 4Experiments

This section details our experimental findings\. We begin with an overview of the two datasets utilized for model training\. We then present the training details and results of the baseline model, contrasting the various methodological approaches introduced previously\. Within this analysis, we focus on scalability and provide results for scaling along the data and model axis\. Finally, we present the results of the model refinement phase\.

### 4\.1Datasets

The experimental evaluation utilizes two distinct datasets of scramjet configurations in hypersonic conditions,D1D1andD2D2, both generated using JAX\-Fluids as detailed in[Section˜3\.1](https://arxiv.org/html/2606.13742#S3.SS1)\. Each scramjet case is parameterized by its design\-parameter vector𝐩i∈ℝd\\mathbf\{p\}\_\{i\}\\in\\mathbb\{R\}^\{d\}\(see Figure[3](https://arxiv.org/html/2606.13742#S4.F3)\), where𝐩i\\mathbf\{p\}\_\{i\}concatenates geometry and inflow parameters of caseii\.D1D1represents a high\-fidelity dataset, whereasD2D2is target\-free, i\.e\., it only comes with mesh and input parameters without ground\-truth field data\. For the pre\-training stage of our models, we rely onD1D1, whereas for fine\-tuning we also leverageD2D2\. We list both datasets and their corresponding properties in[Table˜2](https://arxiv.org/html/2606.13742#S4.T2)\.

MaMaθr1\\theta\_\{r1\}θr2\\theta\_\{r2\}θr3\\theta\_\{r3\}θiso\\theta\_\{\\mathrm\{iso\}\}θcomb\\theta\_\{\\mathrm\{comb\}\}θnoz\\theta\_\{\\mathrm\{noz\}\}xxyyλr1\\lambda\_\{r1\}λr2\\lambda\_\{r2\}λr3\\lambda\_\{r3\}λiso\\lambda\_\{\\mathrm\{iso\}\}λcomb\\lambda\_\{\\mathrm\{comb\}\}λfp\\lambda\_\{\\mathrm\{fp\}\}λci\\lambda\_\{\\mathrm\{ci\}\}Figure 3:Schematic of the parametrized scramjet geometry illustrating the design\-parameters𝐩\\mathbf\{p\}\. Here,λr1,λr2,λr3\\lambda\_\{r1\},\\lambda\_\{r2\},\\lambda\_\{r3\}are the intake ramp length fractions\.λiso\\lambda\_\{\\mathrm\{iso\}\}andλcomb\\lambda\_\{\\mathrm\{comb\}\}denote the isolator and combustor length fractions\.θr1,θr2,θr3\\theta\_\{r1\},\\theta\_\{r2\},\\theta\_\{r3\}are the intake ramp angles, andθiso,θcomb,θnoz\\theta\_\{\\mathrm\{iso\}\},\\theta\_\{\\mathrm\{comb\}\},\\theta\_\{\\mathrm\{noz\}\}the isolator, combustor, and nozzle angles, all measured relative to thexx\-axis\.λfp\\lambda\_\{\\mathrm\{fp\}\}is the flow path height fraction andλci\\lambda\_\{\\mathrm\{ci\}\}is the cowl intake length fraction\. The length fractions are defined with respect to a fixed reference length and the cowl has a fixed height\. In addition to the geometry parameters, the design parameters𝐩\\mathbf\{p\}also contains the inflow Mach numberMaMa\.Table 2:Overview of the two datasets used for model training and evaluation\.D1D\_\{1\}prioritizes numerical accuracy through minimized residuals, whileD2D\_\{2\}comprises only meshes for new scramject parameterizations\.To evaluate the different neural PEs we construct different dataset splits\. Stacking the design\-parameter vectors of all cases considered yields the parameter matrix𝐏∈ℝN×d\\mathbf\{P\}\\in\\mathbb\{R\}^\{N\\times d\}\. First, we fit an Isolation Forest\(Liuet al\.,[2008](https://arxiv.org/html/2606.13742#bib.bib30)\)withT=100T=100trees on the parameter matrix𝐏\\mathbf\{P\}and obtain an anomaly scoresi=decision\_function⁡\(𝐩i\)s\_\{i\}=\\operatorname\{decision\\\_function\}\(\\mathbf\{p\}\_\{i\}\)for each case in the dataset, wheresis\_\{i\}corresponds to points that are isolated by shorter random partitioning paths and are therefore marked anomalous in design space\. We select 10% of cases with the lowest scores and assemble them into an out\-of\-distribution \(OOD\) set\. The remaining in\-distribution pool is then randomly partitioned into train/val/test according to a 80/10/10 split\. We verify the construction post\-hoc by visualizing the anomaly scores of the Isolation Forest per split \(see[Figure˜12](https://arxiv.org/html/2606.13742#A2.F12)in[Appendix˜B](https://arxiv.org/html/2606.13742#A2)\), confirming that OOD samples lie on the periphery of the parameter manifold while train/val/test overlap in the interior\.

### 4\.2Evaluation Protocol

We evaluate each PE against the reference JAX\-Fluids simulations on three engineering key performance indicators \(KPIs\) that quantify aspects of scramjet performance, supplemented by a qualitative comparison of the density field\. They are derived from the full predicted flow, so that they expose the downstream consequences of the predictions, such as total\-pressure recovery up to the inlet to combustor, cumulative total\-pressure loss across the scramjet from the inlet plane to the nozzle exit, and peak thermal load on the wetted geometry\.

##### Total\-pressure ratioπd\\pi\_\{d\}\.

Following standard inlet\-performance practice, we define

πd=⟨pt\|xstn2⟩⟨pt\|xinlet⟩\\pi\_\{d\}=\\frac\{\\left<p\_\{t\}\|\_\{x\_\{\\mathrm\{stn2\}\}\}\\right\>\}\{\\left<p\_\{t\}\|\_\{x\_\{\\mathrm\{inlet\}\}\}\\right\>\}\(3\)i\.e\., the ratio of mass\-flux\-weighted total pressure between the inlet plane and station 2, taken at the axial mid\-point of the scramjet and restricted to the fluid side of the embedded geometry by the signed\-distance level set\. An ideal isentropic compression deliversπd=1\\pi\_\{d\}=1, whereas losses \(e\.g\., due to shocks\) between capture surface and entrance of the combustor reduceπd\\pi\_\{d\}\.

##### Total\-pressure loss ratioΛ\\Lambda\.

Whereasπd\\pi\_\{d\}characterizes the inlet up to station 2, the total\-pressure loss ratio

Λ=⟨pt\|xinlet⟩−⟨pt\|xexit⟩⟨pt\|xinlet⟩\\Lambda=\\frac\{\\left<p\_\{t\}\|\_\{x\_\{\\mathrm\{inlet\}\}\}\\right\>\-\\left<p\_\{t\}\|\_\{x\_\{\\mathrm\{exit\}\}\}\\right\>\}\{\\left<p\_\{t\}\|\_\{x\_\{\\mathrm\{inlet\}\}\}\\right\>\}\(4\)captures the cumulative loss across the entire duct, from the inlet plane to the nozzle exit\. The two pressure\-based KPIs are deliberately complementary\.

##### Peak surface temperature\.

The third KPI is a structural\-design figure of merit, namely the peak temperature on the scramjet wetted surface\. From the predicted temperature fields we extract the peak surface temperature and report the 95th percentile\. Peak wall temperature in a hypersonic intake drives the choice of thermal\-protection material and the cooling\-system budget for the vehicle\.

Table 3:Performance of different physics emulators on hypersonic flowfields\.Relative L2 errors \(%\) for pressurepp, densityρ\\rho, velocity𝒖\\bm\{u\}, enthalpyhh, total pressureptp\_\{t\}, kinetic energykk, temperatureTT, and Mach numberMa\\operatorname\{Ma\}for AB\-UPT, ViT, and Flow Matching \(FM\) across three random seeds\.

### 4\.3Pre\-Training Results

We present results for the three different instantiations of PEs and their performance in the following sections\. All of the results in this section are exclusively based on training onℒdata\\mathcal\{L\}\_\{\\text\{data\}\}and cover a two different architectures and modeling paradigms within a typical toolbox for providing PE solutions for engineering problems\.

#### 4\.3\.1AB\-UPT

As illustrated in[Figure˜4](https://arxiv.org/html/2606.13742#S4.F4), the model maintains high fidelity across the relevant metrics\. Parity plots forπd\\pi\_\{d\},Λ\\Lambda, and peak surface temperature \([Figure˜4](https://arxiv.org/html/2606.13742#S4.F4)a\-c\) show tight clustering along the identity line\. Furthermore, the density field predictions \([Figure˜4](https://arxiv.org/html/2606.13742#S4.F4)e\) closely mirror the ground\-truth solver \([Figure˜4](https://arxiv.org/html/2606.13742#S4.F4)d\)\. The relative error map \([Figure˜4](https://arxiv.org/html/2606.13742#S4.F4)f\) reveals that the highest discrepancies are localized at shockwaves and the exhaust nozzle with an average relativeL2L\_\{2\}error around 3%\. Finally, we report performance metrics in[Table˜4](https://arxiv.org/html/2606.13742#S4.T4)showing that training and inference speed for AB\-UPT is slower than other methods\. Specifically, training sequences are longer due to the selection of anchor points and inference speed is much slower as AB\-UPT decodes field values for each grid position during a single forward pass\. This also results in elevated GFLOPs and memory footprint\.

![Refer to caption](https://arxiv.org/html/2606.13742v1/x7.png)Figure 4:Prediction error of AB\-UPT on hypersonic scramjet\.Parity plots of predicted vs ground\-truth on the OOD test data fora, peak surface temperature,b, total pressure ratio, andc, total pressure loss\. We also report the ground\-truthdand predictededensity field along with the relative error in percentage pointsffor a random case in the OOD test data\.Additionally, we analyze data and model scaling as shown in[Figure˜5](https://arxiv.org/html/2606.13742#S4.F5)\. For data scaling, we train a model with∼25M\\sim 25Mparameters on subsampled training splits\. The different splits comprise\{50,150,450,1200,2800,4816\}\\\{50,150,450,1200,2800,4816\\\}samples using farthest point sampling to maintain coverage of the design parameter space\. For training details and hyperparameter searches we refer the reader to[Appendix˜D](https://arxiv.org/html/2606.13742#A4)\. For model scaling experiments, we scale the model from∼9M\\sim 9Mparameters to∼100M\\sim 100Mparameters\. AB\-UPT exhibits a steady trend of improvement for data scaling on validation, test, and OOD splits and achieves the lowest loss on all datasets compared to competitors\. The trend for model scaling differs slightly\. There is a noticeable improvement when scaling the model from∼9M\\sim 9Mto∼25M\\sim 25Mparameters\. Beyond25M25Mparameters the test error flattens and further increases in depth yield marginal improvements\. We interpret this as a data\-bottlenecked regime in which the model’s representational capacity already exceeds what the training distribution can support\.

![Refer to caption](https://arxiv.org/html/2606.13742v1/x8.png)Figure 5:Scaling laws for the different modeling paradigms\.Data\(a\)and model scaling laws\(b\)for AB\-UPT, ViT, and Flow Matching on validation, test, and OOD extrapolation sets\.
#### 4\.3\.2ViT

We present the results for our trained ViT model in[Figure˜6](https://arxiv.org/html/2606.13742#S4.F6)\. The KPIs are comparable to the ones achieved by AB\-UPT\. Further, the error distribution shows higher error around shockwaves, expansion fans, reflected shocks, and in the exhaust nozzle\. Especially at the exhaust nozzle the error pattern is different to the one obtained by AB\-UPT\. Specifically, the error pattern appears to be an artifact of patching\. A potential remedy would be reducing the patch size or adding a smoothing operation \(e\.g\., by convolution\) in the decoder\. We experimented with the latter and found no significant performance gains\. Furthermore, decreasing the patch size leads to longer token sequences which results in substantial additional cost when performing self\-attention as sequence length scales quadratically with patch size\. The performance metrics in[Table˜4](https://arxiv.org/html/2606.13742#S4.T4)clearly show that ViT is more efficient in terms of training and inference compared to AB\-UPT while also reducing memory footprint and GFLOPs\.

The data\-scaling results in[Figure˜5](https://arxiv.org/html/2606.13742#S4.F5)uncover an interesting finding\. At data\-scarce regimes \(5050samples\) ViT attains lower averageL2L\_\{2\}error across all fields compared to AB\-UPT, however in larger data regimes AB\-UPT recovers and outperforms ViT\. In the data\-scarce regime, ViT’s regular\-grid structure provides a strong spatial inductive bias that reduces the number of learnable degrees of freedom\. AB\-UPT must simultaneously infer both the spatial structure and the physical field mapping, a task for which5050training samples provide weak information\. In terms of model scale we observe similar behavior as for AB\-UPT, again with a consistent offset in error\.

![Refer to caption](https://arxiv.org/html/2606.13742v1/x9.png)Figure 6:Prediction error of ViT on hypersonic scramjet\.Parity plots of predicted vs ground\-truth on the OOD test data fora, peak surface temperature,b, total pressure ratio, andc, total pressure loss\. We also report the ground\-truthdand predictededensity field along with the relative error in percentage pointsffor a random case in the OOD test data\.
#### 4\.3\.3Flow Matching with ViT backbone

We present the results for flow matching for an ensemble of 10 members in[Figure˜7](https://arxiv.org/html/2606.13742#S4.F7)\. While we observe similar correlation coefficients for the different KPIs, the field error is significantly elevated for flow matching\. The predictive uncertainty clearly highlights that the highest uncertainty is located around shockwaves\. Compared to ViT predictions patching artifacts are visually less prevalent as we report the posterior mean over 10 ensemble members, effectively acting as a smoothing operator\. Considering performance metrics in[Table˜4](https://arxiv.org/html/2606.13742#S4.T4), flow matching generally is efficient during training, but requires more memory and inference time due to ensembling compared to ViT\.

The data\-scale results in[Figure˜5](https://arxiv.org/html/2606.13742#S4.F5)show that flow matching generally yields worse performance than AB\-UPT or ViT\. However, we make several interesting observations\. First, the average gap between in\-distribution and OOD evaluation splits is smaller for flow matching \(9\.63%±3\.69%9\.63\\%\\pm 3\.69\\%\) than for AB\-UPT \(24\.23%±5\.72%24\.23\\%\\pm 5\.72\\%\) or ViT \(22\.10%±7\.11%22\.10\\%\\pm 7\.11\\%\)\. Second, elevated test errors indicate that there appears to be a fundamental trade\-off between accuracy and robustness for flow matching\. This might be traced back to the deterministic nature of the problem setup\. As flow matching is a probabilistic method it adds uncertainty estimation and additional robustness, at the cost of accuracy\. We observe similar model scaling behavior as for ViT, albeit with a consistent offset in error\.

![Refer to caption](https://arxiv.org/html/2606.13742v1/x10.png)Figure 7:Prediction error of Flow Matching on hypersonic scramjet\.Parity plots of predicted vs ground\-truth on the OOD test data fora, peak surface temperature,b, total pressure ratio, andc, total pressure loss\. We also report the ground\-truthdand predictededensity field along with the ensemble standard deviationfand relative error in percentage pointsgfor a random case in the OOD test data\.Table 4:Performance metrics for the different physics emulators\.We report number of parameters, latency, FLOPs, and peak memory consumption over 10 inference passes\. AB\-UPT processes the entire point cloud grid as queries and flow matching ensembles over 10 samples\.Uncertainty QuantificationIn practice it is vital to obtain estimates on the predictive uncertainty of the emulator to inform decision making\. Regions that exhibit high predictive uncertainty indicate that model predictions may not be trusted\. We analyze the predictive uncertainty obtained by the different neural PEs\. In particular, we visualize the correlation between the per\-sample RMSE for cases in the OOD evaluation set and check for correlation with the per\-sample mean of the ensemble variance\. To obtain an ensemble for the deterministic ViT and AB\-UPT models, we train three model instances with different random seeds\. For flow matching we construct ensemble members of 10 initial noise samples and average them over three seeds\. We report our findings on the predicted total pressure field for flow matching \([Figure˜8a](https://arxiv.org/html/2606.13742#S4.F8.sf1)\), ViT \([Figure˜8b](https://arxiv.org/html/2606.13742#S4.F8.sf2)\), and AB\-UPT \([Figure˜8c](https://arxiv.org/html/2606.13742#S4.F8.sf3)\)\. Flow matching achieves the highest Pearson correlation \(0\.800\.80\) and coefficient of determination \(R2=0\.64R^\{2\}=0\.64\)\. This pattern is consistent across other predicted fields such as enthalpy, kinetic energy, Mach number, and temperature as we show in[Figure˜13](https://arxiv.org/html/2606.13742#A5.F13)\.

![Refer to caption](https://arxiv.org/html/2606.13742v1/x11.png)aFlow matching
![Refer to caption](https://arxiv.org/html/2606.13742v1/x12.png)bViT
![Refer to caption](https://arxiv.org/html/2606.13742v1/x13.png)cAB\-UPT

Figure 8:Correlation between predictive uncertainty and error of the total\-pressure field across the three emulators\.Each panel compares the predicted distribution ofPtP\_\{t\}against the JAX\-Fluids reference on the out\-of\-distribution test split for the flow\-matching ViT \(a\), the deterministic ViT \(b\) and the AB\-UPT model \(c\)\. We showptp\_\{t\}as it is the direct integrand of the total\-pressure ratioπd\\pi\_\{d\}used as performance KPI\. Correlation ofptp\_\{t\}propagates one\-to\-one intoπd\\pi\_\{d\}at every probe location\. The flow\-matching surrogate is the most strongly calibrated onptp\_\{t\}\.

### 4\.4Physics\-aware Model Refinement

After pre\-training, we perform physics\-aware refinement aimed at improving physical consistency as well as predictive capabilities of the PE\. Each refinement run starts from a pre\-trained checkpoint and minimizes a weighted sum of up to three terms,

ℒ=ℒdata\+wdivℒdiv\+λℒPDE,\\mathcal\{L\}=\\mathcal\{L\}\_\{\\mathrm\{data\}\}\+w\_\{\\mathrm\{div\}\}\\,\\mathcal\{L\}\_\{\\mathrm\{div\}\}\+\\lambda\\,\\mathcal\{L\}\_\{\\mathrm\{PDE\}\},\(5\)namely a supervised data\-reconstruction lossℒdata\\mathcal\{L\}\_\{\\mathrm\{data\}\}, a divergence term from the base modelℒdiv\\mathcal\{L\}\_\{\\mathrm\{div\}\}, and a physics lossℒPDE\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}\.ℒdiv\\mathcal\{L\}\_\{\\mathrm\{div\}\}andℒPDE\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}are target\-free, i\.e\., they require only the mesh and input parameters, with no ground\-truth field data\. The physics loss exploits the differentiability of the JAX\-Fluids solver, evaluating the point\-wise residual of the discretized equations via the solver’s residual operator \(see Appendix[C\.4](https://arxiv.org/html/2606.13742#A3.SS4)\)\. Since neither target\-free term consumes labeled data, refinement requires no additional simulations beyond those already used for pre\-training\. We provide further implementation and training details in[Section˜C\.4](https://arxiv.org/html/2606.13742#A3.SS4)\.

We present results for our proposed model refinement strategy showing that\(i\)model refinement improves performance after pre\-training,\(ii\)the addition of the physics loss improves generalization, and\(iii\)refinement can be entirely target\-free on simulation meshes without the need for solved flow\-field data\.For this line of experiments we take the ViT model from[Section˜4\.3\.2](https://arxiv.org/html/2606.13742#S4.SS3.SSS2)as a base model and conduct different fine\-tuning experiments\. The residual operator acts naturally on the ViT’s regular grid, whereas recovering a full grid from AB\-UPT is expensive \(see[Table˜4](https://arxiv.org/html/2606.13742#S4.T4)\)\. Furthermore, ViT is the cheapest method in terms of FLOPs and inference speed compared to AB\-UPT and flow matching,

To demonstrate the effectiveness of fine\-tuning based on differentiating through the residual operator of JAX\-Fluids, we perform the following experiments\. First, as a baseline we simply fine\-tune the Base model for the same amount of steps as we use for other fine\-tuning strategies \(Base \+ Data\)\. Second, we add a loss term on the residual operator \(Base \+ Data \+ Residual\)\. Finally, to show that no ground\-truth data are required for the fine\-tuning procedure, we remove the data loss term and train only on the residual loss, plus the divergence loss for regularization\. We includeℒdiv\\mathcal\{L\}\_\{\\mathrm\{div\}\}as only training onℒPDE\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}collapses to degenerate fields with near\-zero residuals; the divergence term anchors the prediction to the pre\-trained prior, supplying the missing constraint to select physically meaningful solutions\.

We report the results on the different conservation terms inℒPDE\\mathcal\{L\}\_\{\\text\{PDE\}\}in[Table˜5](https://arxiv.org/html/2606.13742#S4.T5)\. Our main findings are three fold, namely\(i\)the addition ofℒPDE\\mathcal\{L\}\_\{\\text\{PDE\}\}significantly reduces residuals and in turn improves physical consistency,\(ii\)target\-free refinement consistently leads to lowest residuals, and\(iii\)additional gains can be obtained by extending the coverage of the parameter space by augmentingD1withD2\.In[Figure˜10](https://arxiv.org/html/2606.13742#S4.F10)we show the distribution of residuals for the different conservation terms to highlight the significant improvement for target\-free refinement\. In addition we show in[Table˜9](https://arxiv.org/html/2606.13742#A6.T9)in[Appendix˜F](https://arxiv.org/html/2606.13742#A6)that the model after refinement is more thermodynamically consistent in the sense that derived quantities align better with the ground\-truth steady state\. Moreover, in[Figure˜9](https://arxiv.org/html/2606.13742#S4.F9)we visualize the improvement on the different residual terms compared to the base model which highlights that our physics\-aware model refinement significantly reduces residuals especially in regions around the shockwaves, indicated by the highlighted blue region\.

Finally, we investigate the effect of refinement on the predicted flowfields\.[Table˜6](https://arxiv.org/html/2606.13742#S4.T6)reports the relative improvement in field\-level L2 error for the different refinement variants\. Generally we observe minor fluctuations \(<1%<1\\%\) on the derived quantities \(hh,ptp\_\{t\},kk,TT,MaMa\)\. The difference is more pronounced for the primitive fieldspp,ρ\\rho, and𝐮\\mathbf\{u\}, where we observe a slight degradation of field errors up to∼6%\\sim 6\\%for target\-free refinement\. However, this degradation is less pronounced on the OOD set \(∼1%\\sim 1\\%\) and comparable to refinement usingℒdata\\mathcal\{L\}\_\{\\text\{data\}\}\. Our interpretation is that residual\-based refinement substantially improves local conservation properties but yields only minor changes in aggregate field accuracy, suggesting that the quality of the pre\-trained prior remains the dominant factor in prediction accuracy of the field values\.

Table 5:Mean per\-sample RMS of the conservation\-law residuals for massrρr\_\{\\rho\},xx\-momentumrρur\_\{\\rho u\},yy\-momentumrρvr\_\{\\rho v\}, and energyrEr\_\{E\}, each normalized by its characteristic flux scale, for different models fine\-tuned on different loss terms\. Lower is more physically consistent\. Averaged over three random seeds\.∗\\astdenotes extension of training data with additional target\-free cases while matching the number of update steps\.Table 6:Relative change \(%\) over the Base model in the relative L2 error of static pressurepp, densityρ\\rho, velocity𝒖\\bm\{u\}, enthalpyhh, total pressureptp\_\{t\}, kinetic energykk, temperatureTT, and Mach numberMaMa, for the fine\-tuned variants \(positive==lower error than Base\)\. Computed from the mean errors over three random seeds\.∗\\astdenotes extension of training data with additional target\-free cases while matching the number of update steps\.a ![Refer to caption](https://arxiv.org/html/2606.13742v1/x14.png)\\phantomsubcaption

b ![Refer to caption](https://arxiv.org/html/2606.13742v1/x15.png)\\phantomsubcaption

c ![Refer to caption](https://arxiv.org/html/2606.13742v1/x16.png)\\phantomsubcaption

d ![Refer to caption](https://arxiv.org/html/2606.13742v1/x17.png)\\phantomsubcaption

Figure 9:Improvement of residuals after target\-free refinement\.We report the change in normalized residuals where blue indicates improvement over the base model and red indicates worse residuals fora, energy conservation,b, mass conservation,c, x\-component of momentum conservation, andd, y\-component of momentum conservation\.![Refer to caption](https://arxiv.org/html/2606.13742v1/x18.png)Figure 10:Distribution of residuals on the OOD test set\.We report residuals for the conservation terms mass\(a\), momentum in x and y direction\(b–c\), and energy\(d\)\. Base refers to the base model prior to refinement\. Target\-free refinement significantly reduces residuals and hence improves physical consistency\.

## 5Discussion

No absolute hierarchy in model choice\.We observe different trade\-offs between the architectures and training paradigms we investigated\. In the data\-scarce regime the ViT’s regular\-grid inductive bias makes it more sample\-efficient than AB\-UPT’s point\-cloud representation\. Flow matching, while trailing the deterministic models on accuracy, exhibits a substantially smaller in\-distribution\-to\-OOD performance gap and the strongest correlation between predictive uncertainty and per\-sample error on derived quantities\. Rather than seeking a single best architecture, practitioners should select the model class that matches the data budget, the downstream decision \(point estimate versus uncertainty\-aware\), and the inference\-time computational constraint\.

Physics\-aware refinement improves physical consistency\.Model refinement based on the physics loss consistently improves residuals, while field errors only exhibit slight variations\. Since residuals are computed via derivatives and non\-linear transformations of the state variables, even small modifications to the predicted fields can produce disproportionately larger changes in the residuals\. Most importantly, target\-free refinement, only requires meshes and design parameters and never observes a ground\-truth field\. This recovers the physical consistency benefits of PINN\-style residual minimization\(Karniadakiset al\.,[2021](https://arxiv.org/html/2606.13742#bib.bib43)\)while avoiding their well\-known optimization difficulties\(Wanget al\.,[2022](https://arxiv.org/html/2606.13742#bib.bib63)\), as the physics\-aware refinement starts from a trained prior\.

Implications for design optimization\.Because the PE is fully differentiable and the geometric and inflow conditioning is explicit, the trained emulator can be inserted directly into gradient\-based design loops as in\(Bezginet al\.,[2025b](https://arxiv.org/html/2606.13742#bib.bib10); Paischeret al\.,[2025](https://arxiv.org/html/2606.13742#bib.bib62)\), returning sensitivities of integrated KPIs with respect to the1515geometric parameters and the freestream Mach number in milliseconds\. The error numbers on the OOD set are comparable to typical engineering tolerances in early\-stage design, suggesting that the PE could already be deployed for design\-space exploration with full\-solver verification reserved for selected candidates\. However, we note that we have not explored our PE in such a setting yet\.

Practical recommendations\.Based on our findings, we propose a practical guide for developing physics emulators in hypersonic regimes\.\(1\)Establish model requirements\.Determine whether the downstream task demands point estimates or uncertainty quantification, what inference latency is acceptable, and how much training data are available\. In data\-scarce settings, architectures with strong inductive biases such as the regular\-grid ViT are preferable\. When data are abundant and accuracy is paramount, more expressive pointwise architectures such as AB\-UPT should be favored\. If predictive uncertainty is needed, probabilistic formulations such as flow matching offer built\-in distributional estimates\.\(2\)Establish scaling behavior\.Train at progressively larger dataset sizes and monitor the validation error\. As long as the scaling curve has not saturated and the compute budget permits, generating additional data is the most impactful investment, as global prediction quality is determined by the pre\-training stage\.\(3\)Apply target\-free refinement\.Apply residual\-based refinement to improve local physical consistency, ideally on top of a well\-converged base model\. This step requires only meshes and design parameters, no reference flow fields, making it applicable to new regions of the design space at minimal cost\. In our experiments, refinement primarily improves local conservation residuals with limited effect on aggregate field accuracy, suggesting it complements rather than replaces pre\-training data\.

## 6Limitations

Scope of the present study and outlook\.All scramjet configurations in this work are two\-dimensional\. While two\-dimensional configurations omit certain physical effects, they already contain sharp discontinuities and multi\-scale flow structures that remain highly challenging for neural emulators due to spectral bias\(Rahamanet al\.,[2019](https://arxiv.org/html/2606.13742#bib.bib58); Xuet al\.,[2019](https://arxiv.org/html/2606.13742#bib.bib59)\)\. Extension to 3D, including turbulence, shock\-boundary\-layer interaction, and finite\-rate chemistry, is a natural next step which we have not explored here\. The pre\-training dataset comprises roughly7,0007\{,\}000simulations, which our model\-scaling analysis reveals to be the binding constraint beyond∼25\{\\sim\}25M parameters\. A promising direction to overcome this bottleneck is pre\-training on cheaper, intentionally under\-converged data and recovering solution quality through the target\-free refinement stage, further reducing the cost of data generation\.

The differentiability requirement can be a barrier\.Our workflow requires the solver to be differentiable and the residual to be evaluated with the same discretization used to produce the training data\. JAX\-Fluids meets both requirements, but the wider CFD ecosystem remains dominated by CPU\-based, non\-differentiable codes\. The gap is closing through native GPU rewrites in JAX and PyTorch, and through emerging tooling that lowers the porting cost and we expect target\-free refinement to become applicable across an increasing fraction of the engineering simulation stack in the future\. In the interim, compatible differentiable approximations of legacy solvers may offer a useful bridge\.

## 7Conclusions

We introduce a fully GPU\-based workflow for neural physics emulators specifically designed for the stringent requirements of hypersonic flow simulation\. Our workflow ranges from data generation via JAX\-Fluids to pre\-training and physics\-aware refinement leveraging the differentiable code\.

To make regular\-grid architectures applicable to the adaptive block\-structured octree meshes produced by modern GPU\-native solvers, we combine absolute and relative positional encodings in physical space, rendering the model agnostic to block count and ordering\. This enables a direct comparison of regular\-grid and pointwise paradigms on the same data, where we uncover various trade\-offs\. In addition to spatial representation, we compare deterministic and probabilistic training paradigms and conduct scaling studies with respect to model and data size\. Probabilistic modeling trades uncertainty estimates for accuracy and generally exhibits a narrower gap between in\-distribution and OOD performance\. Furthermore, regular\-grid methods outperform pointwise methods in data\-scarce regimes\. While pointwise methods yield highest accuracy in data\-abundant regimes, they are slowest during inference followed by probabilistic and regular\-grid methods\. Therefore, the appropriate choice depends on the data budget, downstream requirements, and inference\-time constraints\.

Our physics\-aware refinement depends only on the mesh and design parameters, hence can be carried out in a fully target\-free manner without reference flowfields\. This is significant for two reasons\. First, new regions of the design space can be reached through meshing alone, at a small fraction of the cost compared to data generation\. Second, the physics loss encourages the surrogate to produce solutions that better satisfy the discretized conservation laws than the base model, which is particularly valuable in extrapolatory regimes where additional training data are unavailable\.

Our results demonstrate that a differentiable solver can serve as a refinement engine for neural PE\. As differentiable GPU\-native solvers mature and pre\-training datasets grow, this paradigm could extend the practical reach of neural PE to increasingly complex flow regimes\. Our workflow is a step toward a broader role for solvers in CFD: not only as primary simulation tools, but as differentiable engines for training and refining physics emulators\.

## Acknowledgement

NAA acknowledges funding through ERC Advanced Grant Project No\. 101094463\. DAB, ABB, and NAA gratefully acknowledge the Gauss Centre for Supercomputing e\.V\. \(www\.gauss\-centre\.eu\) for funding this project by providing computing time on the GCS Supercomputer JUWELS\(Juelich Supercomputing Centre,[2021](https://arxiv.org/html/2606.13742#bib.bib66)\)at Jülich Supercomputing Centre \(JSC\)\.

## References

- \[1\]R\. Abaidi and N\.A\. Adams\(2025\)Exploring denoising diffusion models for compressible fluid field prediction\.Computers & Fluids298,pp\. 106665\.External Links:ISSN 0045\-7930,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.compfluid.2025.106665)Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p4.1)\.
- \[2\]J\. Achiam, S\. Adler, S\. Agarwal, L\. Ahmad, I\. Akkaya, F\. L\. Aleman, D\. Almeida, J\. Altenschmidt, S\. Altman, S\. Anadkat,et al\.\(2023\)Gpt\-4 technical report\.arXiv preprint arXiv:2303\.08774\.Cited by:[§3](https://arxiv.org/html/2606.13742#S3.p1.1)\.
- \[3\]B\. Alkin, M\. Bleeker, R\. Kurle, T\. Kronlachner, R\. Sonnleitner, M\. Dorfer, and J\. Brandstetter\(2025\)AB\-upt: scaling neural cfd surrogates for high\-fidelity automotive aerodynamics simulations via anchored\-branched universal physics transformers\.arXiv preprint arXiv:2502\.09692\.Cited by:[§1](https://arxiv.org/html/2606.13742#S1.p3.1),[§2](https://arxiv.org/html/2606.13742#S2.p2.1),[item \(i\)](https://arxiv.org/html/2606.13742#S3.I3.i1.1),[Table 1](https://arxiv.org/html/2606.13742#S3.T1.6.2.2.1.1.1)\.
- \[4\]B\. Alkin, A\. Fürst, S\. Schmid, L\. Gruber, M\. Holzleitner, and J\. Brandstetter\(2024\)Universal physics transformers: a framework for efficiently scaling neural operators\.Advances in Neural Information Processing Systems37,pp\. 25152–25194\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p2.1)\.
- \[5\]K\. Azizzadenesheli, N\. Kovachki, Z\. Li, M\. Liu\-Schiaffini, J\. Kossaifi, and A\. Anandkumar\(2024\)Neural operators for accelerating scientific simulations and design\.Nature Reviews Physics6\(5\),pp\. 320 – 328\.External Links:[Document](https://dx.doi.org/10.1038/s42254-024-00712-5)Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p2.1)\.
- \[6\]D\. A\. Bezgin, A\. B\. Buhendwa, and N\. A\. Adams\(2023\)JAX\-fluids: a fully\-differentiable high\-order computational fluid dynamics solver for compressible two\-phase flows\.Computer Physics Communications282,pp\. 108527\.External Links:ISSN 0010\-4655,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.cpc.2022.108527)Cited by:[§C\.4](https://arxiv.org/html/2606.13742#A3.SS4.p1.1),[§1](https://arxiv.org/html/2606.13742#S1.p3.1),[§2](https://arxiv.org/html/2606.13742#S2.p1.1),[§3\.1](https://arxiv.org/html/2606.13742#S3.SS1.p1.1)\.
- \[7\]D\. A\. Bezgin, A\. B\. Buhendwa, and N\. A\. Adams\(2025\)JAX\-fluids 2\.0: towards hpc for differentiable cfd of compressible two\-phase flows\.Computer Physics Communications308,pp\. 109433\.External Links:ISSN 0010\-4655,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.cpc.2024.109433)Cited by:[§C\.4](https://arxiv.org/html/2606.13742#A3.SS4.p1.1),[§1](https://arxiv.org/html/2606.13742#S1.p3.1),[§2](https://arxiv.org/html/2606.13742#S2.p1.1),[§3\.1](https://arxiv.org/html/2606.13742#S3.SS1.p1.1)\.
- \[8\]D\. A\. Bezgin, A\. B\. Buhendwa, S\. J\. Schmidt, and N\. A\. Adams\(2025\)ML\-iles: end\-to\-end optimization of data\-driven high\-order godunov\-type finite\-volume schemes for compressible homogeneous isotropic turbulence\.Journal of Computational Physics522,pp\. 113560\.External Links:ISSN 0021\-9991,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.jcp.2024.113560)Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p1.1),[§5](https://arxiv.org/html/2606.13742#S5.p3.1)\.
- \[9\]R\. Borges, M\. Carmona, B\. Costa, and W\. S\. Don\(2008\)An improved weighted essentially non\-oscillatory scheme for hyperbolic conservation laws\.Journal of Computational Physics227\(6\),pp\. 3191–3211\.Cited by:[Appendix A](https://arxiv.org/html/2606.13742#A1.p4.1)\.
- \[10\]M\. P\. Brenner, J\. D\. Eldredge, and J\. B\. Freund\(2019\)Perspective on machine learning for advancing fluid mechanics\.Physical Review Fluids4\(10\),pp\. 100501\.Cited by:[§1](https://arxiv.org/html/2606.13742#S1.p2.1)\.
- \[11\]S\. L\. Brunton, B\. R\. Noack, and P\. Koumoutsakos\(2020\)Machine learning for fluid mechanics\.Annual Review of Fluid Mechanics52\(Volume 52, 2020\),pp\. 477–508\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1146/annurev-fluid-010719-060214),ISSN 1545\-4479Cited by:[§1](https://arxiv.org/html/2606.13742#S1.p2.1)\.
- \[12\]A\. B\. Buhendwa, D\. A\. Bezgin, P\. Karnakov, N\. A\. Adams, and P\. Koumoutsakos\(2025\-08\)Data\-driven shape inference in three\-dimensional steady\-state supersonic flows: optimizing a discrete loss with jax\-fluids\.Phys\. Rev\. Fluids10,pp\. 084902\.External Links:[Document](https://dx.doi.org/10.1103/9wj9-nmr8)Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p3.1)\.
- \[13\]A\. Carreon, J\. Singh, S\. Sharma, S\. Zhang, and V\. Raman\(2025\)A gpu\-based compressible combustion solver for applications exhibiting disparate space and time scales\.External Links:2510\.23993,[Link](https://arxiv.org/abs/2510.23993)Cited by:[§3\.2](https://arxiv.org/html/2606.13742#S3.SS2.p1.1)\.
- \[14\]X\. Chen, C\. Liang, D\. Huang, E\. Real, K\. Wang, H\. Pham, X\. Dong, T\. Luong, C\. Hsieh, Y\. Lu,et al\.\(2023\)Symbolic discovery of optimization algorithms\.Advances in neural information processing systems36,pp\. 49205–49233\.Cited by:[§C\.1](https://arxiv.org/html/2606.13742#A3.SS1.p1.13)\.
- \[15\]A\. Dosovitskiy, L\. Beyer, A\. Kolesnikov, D\. Weissenborn, X\. Zhai, T\. Unterthiner, M\. Dehghani, M\. Minderer, G\. Heigold, S\. Gelly, J\. Uszkoreit, and N\. Houlsby\(2021\)An image is worth 16x16 words: transformers for image recognition at scale\.InICLR,Cited by:[§C\.2](https://arxiv.org/html/2606.13742#A3.SS2.p1.7),[§1](https://arxiv.org/html/2606.13742#S1.p3.1),[item \(ii\)](https://arxiv.org/html/2606.13742#S3.I3.i2.1),[Table 1](https://arxiv.org/html/2606.13742#S3.T1.6.2.2.2.1.1)\.
- \[16\]S\. Elfwing, E\. Uchibe, and K\. Doya\(2018\)Sigmoid\-weighted linear units for neural network function approximation in reinforcement learning\.Neural networks107,pp\. 3–11\.Cited by:[§C\.2](https://arxiv.org/html/2606.13742#A3.SS2.p1.7)\.
- \[17\]T\. F\.\(2009\)Riemann solvers and numerical methods for fluid dynamics \[texte imprimé\] : a practical introduction / eleuterio f\. toro\.3rd ed\. edition,Springer\-Verlag,Berlin Heidelberg New York \(N\.Y\.\)\(eng\)\.External Links:ISBN 978\-3\-540\-25202\-3Cited by:[Appendix A](https://arxiv.org/html/2606.13742#A1.p3.6),[Appendix A](https://arxiv.org/html/2606.13742#A1.p4.1),[§2](https://arxiv.org/html/2606.13742#S2.p3.1)\.
- \[18\]P\. Fischer, S\. Kaltenbach, S\. Litvinov, S\. Succi, and P\. Koumoutsakos\(2025\)Optimal lattice boltzmann closures through multi\-agent reinforcement learning\.arXiv preprint arXiv:2504\.14422\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p1.1)\.
- \[19\]H\. Gao, S\. Kaltenbach, and P\. Koumoutsakos\(2024\)Generative learning for forecasting the dynamics of high\-dimensional complex systems\.Nature Communications15\(1\),pp\. 8904\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p4.1)\.
- \[20\]N\. Gupta and K\. Duraisamy\(2026\)Computational and physical considerations for the development of machine learning augmented turbulence models\.International Journal of Heat and Fluid Flow117,pp\. 110089\.External Links:ISSN 0142\-727X,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.ijheatfluidflow.2025.110089)Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p2.1)\.
- \[21\]W\. H\. Heiser and D\. T\. Pratt\(1994\)Hypersonic airbreathing propulsion\.Aiaa\.Cited by:[§3\.1](https://arxiv.org/html/2606.13742#S3.SS1.p2.1)\.
- \[22\]D\. Hendrycks and K\. Gimpel\(2016\)Gaussian error linear units \(gelus\)\.arXiv preprint arXiv:1606\.08415\.Cited by:[§C\.1](https://arxiv.org/html/2606.13742#A3.SS1.p1.13)\.
- \[23\]J\. Ho, A\. Jain, and P\. Abbeel\(2020\)Denoising diffusion probabilistic models\.Advances in neural information processing systems33,pp\. 6840–6851\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p4.1)\.
- \[24\]X\. Y\. Hu, B\. C\. Khoo, N\. A\. Adams, and F\. L\. Huang\(2006\)A conservative interface method for compressible flows\.Journal of Computational Physics219,pp\. 553–578\.External Links:[Document](https://dx.doi.org/10.1016/j.jcp.2006.04.001),ISSN 00219991Cited by:[Appendix A](https://arxiv.org/html/2606.13742#A1.p5.3)\.
- \[25\]K\. Jaber, E\. Essel, and P\. Sullivan\(2026\-04\)GPU\-native embedding of complex geometries in adaptive octree grids applied to the lattice boltzmann method\.Computer Physics Communications324,pp\. 110155\.External Links:[Document](https://dx.doi.org/10.1016/j.cpc.2026.110155)Cited by:[§3\.2](https://arxiv.org/html/2606.13742#S3.SS2.p1.1)\.
- \[26\]A\. D\. Jagtap, Z\. Mao, N\. Adams, and G\. E\. Karniadakis\(2022\)Physics\-informed neural networks for inverse problems in supersonic flows\.Journal of Computational Physics466,pp\. 111402\.External Links:ISSN 0021\-9991,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.jcp.2022.111402)Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p3.1)\.
- \[27\]Juelich Supercomputing Centre\(2021\)JUWELS Cluster and Booster: Exascale Pathfinder with Modular Supercomputing Architecture at Juelich Supercomputing Centre\.Journal of large\-scale research facilities7\(A183\)\.External Links:[Document](https://dx.doi.org/10.17815/jlsrf-7-183),[Link](http://dx.doi.org/10.17815/jlsrf-7-183)Cited by:[Acknowledgement](https://arxiv.org/html/2606.13742#Sx1.p1.1)\.
- \[28\]G\. E\. Karniadakis, I\. G\. Kevrekidis, L\. Lu, P\. Perdikaris, S\. Wang, and L\. Yang\(2021\)Physics\-informed machine learning\.Nature Reviews Physics3\(6\),pp\. 422–440\.Cited by:[§1](https://arxiv.org/html/2606.13742#S1.p2.1),[§2](https://arxiv.org/html/2606.13742#S2.p3.1),[§5](https://arxiv.org/html/2606.13742#S5.p2.1)\.
- \[29\]M\. C\. Kennedy and A\. O’Hagan\(2001\)Bayesian calibration of computer models\.Journal of the Royal Statistical Society: Series B \(Statistical Methodology\)63\(3\),pp\. 425–464\.External Links:[Document](https://dx.doi.org/10.1111/1467-9868.00294)Cited by:[§3](https://arxiv.org/html/2606.13742#S3.p2.1)\.
- \[30\]D\. Kochkov, J\. A\. Smith, A\. Alieva, Q\. Wang, M\. P\. Brenner, and S\. Hoyer\(2021\)Machine learning–accelerated computational fluid dynamics\.Proceedings of the National Academy of Sciences118\(21\),pp\. e2101784118\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p1.1)\.
- \[31\]R\.J\. LeVeque\(1992\)Numerical methods for conservation laws\.Lectures in Mathematics ETH Zürich, Department of Mathematics Research Institute of Mathematics,Springer Basel AG\.External Links:ISBN 9783764327231,LCCN lc92003400Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p3.1)\.
- \[32\]T\. Li and K\. He\(2025\)Back to basics: let denoising generative models denoise\.arXiv preprint arXiv:2511\.13720\.Cited by:[§C\.3](https://arxiv.org/html/2606.13742#A3.SS3.p1.11)\.
- \[33\]Z\. Li, N\. Kovachki, K\. Azizzadenesheli, B\. Liu, K\. Bhattacharya, A\. Stuart, and A\. Anandkumar\(2020\)Fourier neural operator for parametric partial differential equations\.arXiv preprint arXiv:2010\.08895\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p2.1)\.
- \[34\]M\. Lienen, D\. Lüdke, J\. Hansen\-Palmus, and S\. Günnemann\(2024\)From zero to turbulence: generative modeling for 3d flow simulation\.InInternational Conference on Learning Representations,Vol\.2024,pp\. 5203–5220\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p4.1)\.
- \[35\]Y\. Lipman, R\. T\. Chen, H\. Ben\-Hamu, M\. Nickel, and M\. Le\(2022\)Flow matching for generative modeling\.arXiv preprint arXiv:2210\.02747\.Cited by:[§C\.3](https://arxiv.org/html/2606.13742#A3.SS3.p1.11),[item \(iii\)](https://arxiv.org/html/2606.13742#S3.I3.i3.1),[Table 1](https://arxiv.org/html/2606.13742#S3.T1.6.2.2.3.1.1)\.
- \[36\]F\. T\. Liu, K\. M\. Ting, and Z\. Zhou\(2008\)Isolation forest\.In2008 Eighth IEEE International Conference on Data Mining,Vol\.,pp\. 413–422\.External Links:[Document](https://dx.doi.org/10.1109/ICDM.2008.17)Cited by:[Appendix B](https://arxiv.org/html/2606.13742#A2.p1.6),[§4\.1](https://arxiv.org/html/2606.13742#S4.SS1.p2.5)\.
- \[37\]Q\. Liu and N\. Thuerey\(2024\)Uncertainty\-aware surrogate models for airfoil flow simulations with denoising diffusion probabilistic models\.AIAA Journal62\(8\),pp\. 2912–2933\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p4.1)\.
- \[38\]L\. Lu, P\. Jin, G\. Pang, Z\. Zhang, and G\. E\. Karniadakis\(2021\)Learning nonlinear operators via deeponet based on the universal approximation theorem of operators\.Nature machine intelligence3\(3\),pp\. 218–229\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p2.1)\.
- \[39\]Z\. Mao, A\. D\. Jagtap, and G\. E\. Karniadakis\(2020\)Physics\-informed neural networks for high\-speed flows\.Computer Methods in Applied Mechanics and Engineering360,pp\. 112789\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p3.1)\.
- \[40\]C\. P\. McKee and D\. J\. Hollenbach\(1980\)Interstellar shock waves\.Annual Review of Astronomy and Astrophysics18\(Volume 18, 1980\),pp\. 219–262\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1146/annurev.aa.18.090180.001251),ISSN 1545\-4282Cited by:[§1](https://arxiv.org/html/2606.13742#S1.p1.3)\.
- \[41\]R\. Molinaro, S\. Lanthaler, B\. Raonić, T\. Rohner, V\. Armegioiu, S\. Simonis, D\. Grund, Y\. Ramic, Z\. Y\. Wan, F\. Sha,et al\.\(2024\)Generative ai for fast and accurate statistical computation of fluids\.arXiv preprint arXiv:2409\.18359\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p4.1)\.
- \[42\]G\. Novati, H\. L\. De Laroussilhe, and P\. Koumoutsakos\(2021\)Automating turbulence modelling by multi\-agent reinforcement learning\.Nature Machine Intelligence3\(1\),pp\. 87–96\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p1.1)\.
- \[43\]F\. Paischer, L\. Cotteleer, Y\. Dreze, R\. Kurle, D\. Rubini, M\. Bleeker, T\. Kronlachner, and J\. Brandstetter\(2025\)Going with the speed of sound: pushing neural surrogates into highly\-turbulent transonic regimes\.arXiv preprint arXiv:2511\.21474\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p3.1),[§5](https://arxiv.org/html/2606.13742#S5.p3.1)\.
- \[44\]W\. Peebles and S\. Xie\(2023\)Scalable diffusion models with transformers\.InProceedings of the IEEE/CVF international conference on computer vision,pp\. 4195–4205\.Cited by:[§C\.1](https://arxiv.org/html/2606.13742#A3.SS1.p1.13)\.
- \[45\]T\. Pfaff, M\. Fortunato, A\. Sanchez\-Gonzalez, and P\. W\. Battaglia\(2020\)Learning mesh\-based simulation with graph networks\.arXiv preprint arXiv:2010\.03409\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p2.1)\.
- \[46\]N\. Rahaman, A\. Baratin, D\. Arpit, F\. Draxler, M\. Lin, F\. A\. Hamprecht, Y\. Bengio, and A\. C\. Courville\(2019\)On the spectral bias of neural networks\.InProceedings of the 36th International Conference on Machine Learning, ICML 2019, 9\-15 June 2019, Long Beach, California, USA,K\. Chaudhuri and R\. Salakhutdinov \(Eds\.\),Proceedings of Machine Learning Research,pp\. 5301–5310\.Cited by:[§6](https://arxiv.org/html/2606.13742#S6.p1.2)\.
- \[47\]D\. Ray and J\. S\. Hesthaven\(2018\)An artificial neural network as a troubled\-cell indicator\.Journal of computational physics367,pp\. 166–191\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p1.1)\.
- \[48\]D\. Rossinelli, B\. Hejazialhosseini, P\. Hadjidoukas, C\. Bekas, A\. Curioni, A\. Bertsch, S\. Futral, S\. J\. Schmidt, N\. A\. Adams, and P\. Koumoutsakos\(2013\)11 pflop/s simulations of cloud cavitation collapse\.InProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis,SC ’13,New York, NY, USA\.External Links:ISBN 9781450323789,[Document](https://dx.doi.org/10.1145/2503210.2504565)Cited by:[§1](https://arxiv.org/html/2606.13742#S1.p2.1)\.
- \[49\]J\. Sacks, W\. J\. Welch, T\. J\. Mitchell, and H\. P\. Wynn\(1989\)Design and analysis of computer experiments\.Statistical Science4\(4\),pp\. 409–423\.External Links:[Document](https://dx.doi.org/10.1214/ss/1177012413)Cited by:[§3](https://arxiv.org/html/2606.13742#S3.p2.1)\.
- \[50\]N\. Shazeer\(2020\)Glu variants improve transformer\.arXiv preprint arXiv:2002\.05202\.Cited by:[§C\.2](https://arxiv.org/html/2606.13742#A3.SS2.p2.5)\.
- \[51\]J\. P\. Slotnick, A\. Khodadoust, J\. Alonso, D\. Darmofal, W\. Gropp, E\. Lurie, and D\. J\. Mavriplis\(2014\)CFD vision 2030 study: a path to revolutionary computational aerosciences\.Technical reportCited by:[§1](https://arxiv.org/html/2606.13742#S1.p1.3)\.
- \[52\]J\. Su, M\. Ahmed, Y\. Lu, S\. Pan, W\. Bo, and Y\. Liu\(2024\)Roformer: enhanced transformer with rotary position embedding\.Neurocomputing568,pp\. 127063\.Cited by:[§C\.1](https://arxiv.org/html/2606.13742#A3.SS1.p1.13)\.
- \[53\]G\. Team, R\. Anil, S\. Borgeaud, J\. Alayrac, J\. Yu, R\. Soricut, J\. Schalkwyk, A\. M\. Dai, A\. Hauth, K\. Millican,et al\.\(2023\)Gemini: a family of highly capable multimodal models\.arXiv preprint arXiv:2312\.11805\.Cited by:[§3](https://arxiv.org/html/2606.13742#S3.p1.1)\.
- \[54\]N\. Thuerey, K\. Weißenow, L\. Prantl, and X\. Hu\(2020\)Deep learning methods for reynolds\-averaged navier–stokes simulations of airfoil flows\.AIAA journal58\(1\),pp\. 25–36\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p1.1)\.
- \[55\]J\. Urzay\(2018\)Supersonic combustion in air\-breathing propulsion systems for hypersonic flight\.Annual Review of Fluid Mechanics50\(Volume 50, 2018\),pp\. 593–627\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1146/annurev-fluid-122316-045217),ISSN 1545\-4479Cited by:[§1](https://arxiv.org/html/2606.13742#S1.p1.3)\.
- \[56\]A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, Ł\. Kaiser, and I\. Polosukhin\(2017\)Attention is all you need\.Advances in neural information processing systems30\.Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p2.1),[§3](https://arxiv.org/html/2606.13742#S3.p1.1)\.
- \[57\]R\. Vinuesa and S\. L\. Brunton\(2022\)Enhancing computational fluid dynamics with machine learning\.Nature Computational Science2\(6\),pp\. 358 – 366\.External Links:[Document](https://dx.doi.org/10.1038/s43588-022-00264-7)Cited by:[§2](https://arxiv.org/html/2606.13742#S2.p2.1)\.
- \[58\]S\. Wang, X\. Yu, and P\. Perdikaris\(2022\)When and why pinns fail to train: a neural tangent kernel perspective\.Journal of Computational Physics449,pp\. 110768\.Cited by:[§5](https://arxiv.org/html/2606.13742#S5.p2.1)\.
- \[59\]B\. Wilfong, A\. Radhakrishnan, H\. Le Berre, D\. Vickers, T\. Prathi, N\. Tselepidis, B\. Dorschner, R\. Budiardja, B\. Cornille, S\. Abbott, F\. Schäfer, and S\. Bryngelson\(2025\)Simulating many\-engine spacecraft: exceeding 1 quadrillion degrees of freedom via information geometric regularization\.InProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis,SC ’25,New York, NY, USA,pp\. 14–24\.External Links:ISBN 9798400714665,[Document](https://dx.doi.org/10.1145/3712285.3771783)Cited by:[§1](https://arxiv.org/html/2606.13742#S1.p2.1)\.
- \[60\]Z\. J\. Xu, Y\. Zhang, T\. Luo, Y\. Xiao, and Z\. Ma\(2019\)Frequency principle: fourier analysis sheds light on deep neural networks\.CoRRabs/1901\.06523\.External Links:1901\.06523Cited by:[§6](https://arxiv.org/html/2606.13742#S6.p1.2)\.
- \[61\]B\. Zhang and R\. Sennrich\(2019\)Root mean square layer normalization\.Advances in neural information processing systems32\.Cited by:[§C\.2](https://arxiv.org/html/2606.13742#A3.SS2.p2.5)\.

## Appendix APhysical and Numerical Model

The fluid state at position𝐱=\[x,y\]T\\mathbf\{x\}=\[x,y\]^\{T\}and timettis described either by the vector of primitive variables𝐖=\[ρ,u,v,p\]T\\mathbf\{W\}=\[\\rho,u,v,p\]^\{T\}or by the vector of conservative variables𝐔=\[ρ,ρu,ρv,E\]T\\mathbf\{U\}=\[\\rho,\\rho u,\\rho v,E\]^\{T\}\. Here,ρ\\rhodenotes the density,uuandvvare the velocity components in thexx\- andyy\-directions, respectively, andppis the pressure\. The velocity vector is denoted by𝐮=\[u,v\]T\\mathbf\{u\}=\[u,v\]^\{T\}\. The total energy per unit volume is given by

E=ρe\+12ρ𝐮⋅𝐮,\\displaystyle E=\\rho e\+\\frac\{1\}\{2\}\\rho\\mathbf\{u\}\\cdot\\mathbf\{u\},\(6\)whereeeis the specific internal energy\.

The present study considers inviscid compressible flows governed by the two\-dimensional Euler equations,

∂𝐔∂t\+∂𝐅\(𝐔\)∂x\+∂𝐆\(𝐔\)∂y=0,\\displaystyle\\frac\{\\partial\\mathbf\{U\}\}\{\\partial t\}\+\\frac\{\\partial\\mathbf\{F\}\(\\mathbf\{U\}\)\}\{\\partial x\}\+\\frac\{\\partial\\mathbf\{G\}\(\\mathbf\{U\}\)\}\{\\partial y\}=0,\(7\)where𝐅\\mathbf\{F\}and𝐆\\mathbf\{G\}are the convective flux vectors inxx\- andyy\-directions, respectively,

𝐅\(𝐔\)=\[ρuρu2\+pρuvu\(E\+p\)\],𝐆\(𝐔\)=\[ρvρuvρv2\+pv\(E\+p\)\]\.\\displaystyle\\mathbf\{F\}\(\\mathbf\{U\}\)=\\begin\{bmatrix\}\\rho u\\\\ \\rho u^\{2\}\+p\\\\ \\rho uv\\\\ u\(E\+p\)\\end\{bmatrix\},\\qquad\\mathbf\{G\}\(\\mathbf\{U\}\)=\\begin\{bmatrix\}\\rho v\\\\ \\rho uv\\\\ \\rho v^\{2\}\+p\\\\ v\(E\+p\)\\end\{bmatrix\}\.\(8\)The system is closed using the ideal\-gas equation of state,

p=\(γ−1\)ρe,c=γpρ,\\displaystyle p=\(\\gamma\-1\)\\rho e,\\qquad c=\\sqrt\{\\gamma\\frac\{p\}\{\\rho\}\},\(9\)whereγ\\gammais the ratio of specific heats andccis the speed of sound\. In this work, we useγ=1\.4\\gamma=1\.4\. Equivalently, the pressure can be recovered from the conservative variables as

p=\(γ−1\)\(E−12ρ\(u2\+v2\)\)\.\\displaystyle p=\(\\gamma\-1\)\\left\(E\-\\frac\{1\}\{2\}\\rho\(u^\{2\}\+v^\{2\}\)\\right\)\.\(10\)
\(i,j\)\(i,j\)\(i,j\+1\)\(i,j\+1\)\(i\+1,j\)\(i\+1,j\)\(i−1,j\)\(i\-1,j\)\(i,j−1\)\(i,j\-1\)\(i−1,j−1\)\(i\-1,j\-1\)\(i\+1,j\+1\)\(i\+1,j\+1\)\(i−1,j\+1\)\(i\-1,j\+1\)\(i\+1,j−1\)\(i\+1,j\-1\)ΔΓi,j\\Delta\\Gamma\_\{i,j\}Ai−12,j=1\.0A\_\{i\-\\frac\{1\}\{2\},j\}=1\.0Ai,j−12=1\.0A\_\{i,j\-\\frac\{1\}\{2\}\}=1\.0Ai,j\+12A\_\{i,j\+\\frac\{1\}\{2\}\}Ai\+12,jA\_\{i\+\\frac\{1\}\{2\},j\}yyxxFigure 11:Schematic of a cut cell\.The compressible Euler equations are discretized on a Cartesian grid using a high\-order Godunov\-type finite\-volume formulation\[[17](https://arxiv.org/html/2606.13742#bib.bib5)\]\. The computational domain is divided into rectangular control volumes with uniform cell sizesΔx\\Delta xandΔy\\Delta y\. Cell centers are indexed by\(i,j\)\(i,j\), and𝐔i,j\\mathbf\{U\}\_\{i,j\}denotes the cell averaged conservative state\. Numerical fluxes through the vertical and horizontal cell faces are denoted by𝐅i±12,j\\mathbf\{F\}\_\{i\\pm\\frac\{1\}\{2\},j\}and𝐆i,j±12\\mathbf\{G\}\_\{i,j\\pm\\frac\{1\}\{2\}\}, respectively\.

The numerical fluxes are computed using a two\-step high\-order Godunov procedure\. First, left and right states are reconstructed at each cell face from the neighboring cell averages using a shock\-capturing reconstruction scheme\. In this work, we employ the fifth\-order WENO\-Z reconstruction\[[9](https://arxiv.org/html/2606.13742#bib.bib29)\]\. Second, the reconstructed states are used as input to an approximate Riemann solver\. Here, we use the HLLC Riemann solver\[[17](https://arxiv.org/html/2606.13742#bib.bib5)\]to compute the numerical fluxes across the cell faces\.

Immersed solid boundaries are represented using a conservative sharp\-interface cut\-cell method\[[24](https://arxiv.org/html/2606.13742#bib.bib27)\]\. The solid\-fluid interface is described implicitly by a level\-set functionϕ\(𝐱\)\\phi\(\\mathbf\{x\}\), whereϕ\\phisatisfies the signed\-distance property‖∇ϕ‖=1\\\|\\nabla\\phi\\\|=1\. The interface location is given by the zero level set

Γ=\{𝐱∣ϕ\(𝐱\)=0\}\.\\displaystyle\\Gamma=\\left\\\{\\mathbf\{x\}\\mid\\phi\(\\mathbf\{x\}\)=0\\right\\\}\.\(11\)Cells intersected by the interface are referred to ascut cells, while cells that are entirely filled with fluid are referred to asfull cells\. A schematic illustration of a cut cell, including the fluid volume fraction, face apertures, and interface segment, is shown in[Figure˜11](https://arxiv.org/html/2606.13742#A1.F11)\.

For each cell, we define the fluid volume fractionαi,j∈\[0,1\]\\alpha\_\{i,j\}\\in\[0,1\], which denotes the fraction of the cell area occupied by fluid\. We also define the face aperturesAi±12,jA\_\{i\\pm\\frac\{1\}\{2\},j\}andAi,j±12A\_\{i,j\\pm\\frac\{1\}\{2\}\}, which denote the fractions of the corresponding cell faces open to the fluid phase\. For full fluid cells, all apertures and the volume fraction are equal to one\. The standard finite\-volume discretization is therefore recovered as a special case of the cut\-cell formulation\.

The semi\-discrete conservative update for both full cells and cut cells is written as

ddt\(αi,j𝐔i,j\)=\\displaystyle\\frac\{d\}\{dt\}\\left\(\\alpha\_\{i,j\}\\mathbf\{U\}\_\{i,j\}\\right\)=1Δx\(Ai−12,j𝐅i−12,j−Ai\+12,j𝐅i\+12,j\)\\displaystyle\\frac\{1\}\{\\Delta x\}\\left\(A\_\{i\-\\frac\{1\}\{2\},j\}\\mathbf\{F\}\_\{i\-\\frac\{1\}\{2\},j\}\-A\_\{i\+\\frac\{1\}\{2\},j\}\\mathbf\{F\}\_\{i\+\\frac\{1\}\{2\},j\}\\right\)\+1Δy\(Ai,j−12𝐆i,j−12−Ai,j\+12𝐆i,j\+12\)\+1ΔxΔy𝐗i,j\.\\displaystyle\+\\frac\{1\}\{\\Delta y\}\\left\(A\_\{i,j\-\\frac\{1\}\{2\}\}\\mathbf\{G\}\_\{i,j\-\\frac\{1\}\{2\}\}\-A\_\{i,j\+\\frac\{1\}\{2\}\}\\mathbf\{G\}\_\{i,j\+\\frac\{1\}\{2\}\}\\right\)\+\\frac\{1\}\{\\Delta x\\Delta y\}\\mathbf\{X\}\_\{i,j\}\.\(12\)Here, the cell face fluxes are weighted by the corresponding apertures, and𝐗i,j\\mathbf\{X\}\_\{i,j\}denotes the interface flux contribution inside cell\(i,j\)\(i,j\)\. For full cells, no solid\-fluid interface is present and therefore𝐗i,j=𝟎\\mathbf\{X\}\_\{i,j\}=\\mathbf\{0\}\. Together withαi,j=1\\alpha\_\{i,j\}=1and unit apertures, Eq\. \([12](https://arxiv.org/html/2606.13742#A1.E12)\) reduces to the standard semi\-discrete finite\-volume update on a Cartesian grid\.

The interface flux represents the force exerted by the immersed boundary on the fluid\. In two dimensions, it is given by

𝐗i,j=\[0pΓΔΓxpΓΔΓypΓΔ𝚪i,j⋅𝐯Γ\],Δ𝚪i,j=\[ΔΓxΔΓy\]=\[\(Ai\+12,j−Ai−12,j\)Δy\(Ai,j\+12−Ai,j−12\)Δx\]\.\\displaystyle\\mathbf\{X\}\_\{i,j\}=\\begin\{bmatrix\}0\\\\ p\_\{\\Gamma\}\\Delta\\Gamma\_\{x\}\\\\ p\_\{\\Gamma\}\\Delta\\Gamma\_\{y\}\\\\ p\_\{\\Gamma\}\\Delta\\bm\{\\Gamma\}\_\{i,j\}\\cdot\\mathbf\{v\}\_\{\\Gamma\}\\end\{bmatrix\},\\qquad\\Delta\\bm\{\\Gamma\}\_\{i,j\}=\\begin\{bmatrix\}\\Delta\\Gamma\_\{x\}\\\\ \\Delta\\Gamma\_\{y\}\\end\{bmatrix\}=\\begin\{bmatrix\}\\left\(A\_\{i\+\\frac\{1\}\{2\},j\}\-A\_\{i\-\\frac\{1\}\{2\},j\}\\right\)\\Delta y\\\\ \\left\(A\_\{i,j\+\\frac\{1\}\{2\}\}\-A\_\{i,j\-\\frac\{1\}\{2\}\}\\right\)\\Delta x\\end\{bmatrix\}\.\(13\)Here,pΓp\_\{\\Gamma\}is the pressure at the interface,𝐯Γ\\mathbf\{v\}\_\{\\Gamma\}is the interface velocity, andΔ𝚪i,j\\Delta\\bm\{\\Gamma\}\_\{i,j\}is the oriented interface length vector associated with the cut cell\. The components of this vector correspond to the projections of the interface segment onto the coordinate directions and are obtained from the differences of opposite face apertures\. The interface pressure is approximated by the cell\-center pressure of the corresponding cut cell\. Since only static solid bodies are considered in this work, the interface velocity is zero,𝐯Γ=𝟎\\mathbf\{v\}\_\{\\Gamma\}=\\mathbf\{0\}, and the energy contribution of the interface flux vanishes\. The interface flux therefore contributes only to the momentum equations through the pressure force exerted by the solid boundary\.

The volume fractionαi,j\\alpha\_\{i,j\}, the face aperturesAi±12,jA\_\{i\\pm\\frac\{1\}\{2\},j\}andAi,j±12A\_\{i,j\\pm\\frac\{1\}\{2\}\}, and the oriented interface length vectorΔ𝚪i,j\\Delta\\bm\{\\Gamma\}\_\{i,j\}are computed from the level\-set representation of the geometry using a marching\-squares reconstruction\. This yields a conservative cut\-cell discretization that preserves the structured Cartesian layout while allowing immersed solid boundaries to be represented sharply\.

## Appendix BData Generation

To split the dataset into in\-distribution and out\-of\-distribution subsets without imposing an arbitrary cutoff on any single design variable, we use an unsupervised Isolation Forest\[[36](https://arxiv.org/html/2606.13742#bib.bib30)\]over thedd\-dimensional vector of varying scramjet design parameters\. The forest comprises 100 isolation trees fit with a fixed random seed\. The Isolation Forest builds 100 random binary trees that recursively split the design space, and assigns each sample a scoresis\_\{i\}equal to the average number of splits needed to isolate it\. Samples in sparse regions are isolated in few splits and therefore receive low scores, which we use to flag out\-of\-distribution candidates\. We designated as out\-of\-distribution thekksamples with the lowest scores, wherekkis fixed by the requested OOD fraction\.[Figure˜12](https://arxiv.org/html/2606.13742#A2.F12)shows the distribution of the anomaly scores over all cases and the different datasets \(D1,D2D1,D2\) to illustrate that the selected OOD split generally corresponds to the tail of the distribution, so edge cases of the generated data\. While there is minimal overlap in the anomaly scores betweenD2D2and the selected OOD samples we stress that these do not correspond to exact same parameter settings\. In fact, there is no exact match between any of the cases in the overlapping anomaly scores in terms of geometry and inflow parameters so the parameter space they cover differ\.

![Refer to caption](https://arxiv.org/html/2606.13742v1/x19.png)Figure 12:Isolation\-Forest diagnostic for the out\-of\-distribution split\.We report the distribution of anomaly scores obtained via Isolation Forest fitted on the geometry and inflow parameters\. Samples at the lower end of the distribution are selected to be OOD\. Importantly there is no overlap in the parameter vectors betweenD2D2and the OOD test set, so no information is leaked to the OOD set\. The remaining sets are randomly selected from the remaining samples\.
## Appendix CImplementation Details

### C\.1AB\-UPT

We instantiate AB\-UPT as a volume\-only configuration, which discards the surface branch and operates solely on a 3D volumetric anchor set, since the scramjet task requires field predictions inside the flow domain rather than on a surface\. From every simulation we drawNvol=16,384N\_\{\\mathrm\{vol\}\}=16\{,\}384volume anchor points uniformly at random from the mesh\. Volumetric coordinates are linearly rescaled into\[0,1000\]2\[0,1000\]^\{2\}before being fed to the network, and we predict all eight target fields, namely static pressure, density, two\-component velocity, specific enthalpy, total pressure, kinetic energy, temperature and Mach number\. Each field is standardized per\-channel using statistics computed once over the training split\. Each anchor position𝐱i∈ℝ2\\mathbf\{x\}\_\{i\}\\in\\mathbb\{R\}^\{2\}is first lifted to a token of widthd=384d=384via a continuous sin–cos Fourier embedding followed by a two\-layer GELU MLP\[[22](https://arxiv.org/html/2606.13742#bib.bib49)\]\. We additionally apply rotary positional encodings\[[52](https://arxiv.org/html/2606.13742#bib.bib48), RoPE\]inside attention\. The trunk is a stack of ten self\-attention blocks followed by four additional volume\-decoder blocks\. Therefore, during training anchors are used as both queries and keys/values\. We selecth=3h=3heads, an MLP expansion factor of 4, and truncated\-normal initialization \(σ=0\.02\\sigma=0\.02\)\. A final linear head decodes each anchor token to the eight\-channel field prediction\. The model is conditioned on a 16\-dimensional design vector \(15 geometry parameters and the inflow Mach number\) that is injected into every transformer block via the AdaLN\-style scale/shift pathway\[[44](https://arxiv.org/html/2606.13742#bib.bib50)\]with a per\-condition dimension of 16\. Training minimizes an unweighted sum of channel\-wise MSE losses on the standardized targets, optimized with Lion\[[14](https://arxiv.org/html/2606.13742#bib.bib54)\]at a peak learning rate of1×10−51\\times 10^\{\-5\}, weight decay0\.050\.05, gradient\-norm clipping at1\.01\.0, and a linear warm\-up over5%5\\%of training followed by a cosine decay to1×10−61\\times 10^\{\-6\}\. We train for 250 epochs in float16 mixed precision with an effective batch size of one simulation per step, where each step performs attention over16,38416\{,\}384anchor tokens\. We select the final checkpoint by the best average relativeL2L\_\{2\}loss on the validation set\.

For scaling experiments we vary the depth of the model ranging from≈10M\\approx 10Mparameters to≈100M\\approx 100Mparameters\. Specifically, we vary the depth in terms of transformer blocks in\{1,10,25,50\}\\\{1,10,25,50\\\}as it allows higher learning rates for larger models compared to scaling in width\. For each of these variants we keep the same model hidden dimension asd=384d=384\.

### C\.2ViT

We compare against a grid\-native Vision Transformer\[[15](https://arxiv.org/html/2606.13742#bib.bib15), ViT\]tailored to the block\-structured, obtained by adaptively\-refined meshes produced by the JAX\-Fluids solver\. Each simulation is exposed to the network as a stack ofNBN\_\{B\}Cartesian blocks of shapeH×WH\\times W\(in our caseH=W=64H=W=64\), with per\-block isotropic spatial scale and a Boolean fluid mask flagging cells outside the wetted domain\. Coordinates are min–max rescaled to\[0,1000\]2\[0,1000\]^\{2\}and per\-block scales are min–max rescaled to\[0\.2,1000\]\[0\.2,1000\]before being passed to the network\. All eight target fields are channel\-wise z\-score normalized using the same precomputed statistics as for the AB\-UPT model\. Patchification is performed with a non\-overlapping patch size ofp=16p=16via 2D average pooling on the coordinate grid\. Blocks with zero fluid fraction are pruned and the remaining blocks are right\-padded across the batch so that token sequences are concatenated over all blocks of a sample\. Each patch centroid is then encoded with a continuous sin–cos Fourier embedding followed by a SiLU MLP\[[16](https://arxiv.org/html/2606.13742#bib.bib53)\]into a token of widthd=384d=384, and RoPE is applied inside every attention head, derived from the same average\-pooled centers\.

The backbone is a stack ofL=8L=8pre\-norm transformer blocks withh=3h=3heads, RMSNorm \(ε=10−6\\varepsilon=10^\{\-6\}\)\[[61](https://arxiv.org/html/2606.13742#bib.bib51)\], a SwiGLU FFN\[[50](https://arxiv.org/html/2606.13742#bib.bib52)\]with expansion factor 4, and AdaLN\-Zero conditioning on the 16\-dimensional design vector\. The design vector is first embedded by a shared SiLU MLP and then drives a per\-block linear head that produces six modulation parameters \(shift, scale, gate for MSA and MLP\) per channel\. Weights are initialized with Xavier\-uniform on all linear layers and the AdaLN modulation projection and the final linear layer are zero\-initialized, while the MSA/MLP gate biases are reset to1\.01\.0so blocks are fully active at step zero and the network does not collapse to the identity \(a failure mode we observed under the standard AdaLN\-Zero initialization in this regression setting\)\. The output head projects each patch token top2⋅Coutp^\{2\}\\cdot C\_\{\\text\{out\}\}channels which are unpatchified back to the original cell grid and sliced into the eight field\-specific predictions\.

The training objective is identical to AB\-UPT, an unweighted sum of eight per\-field MSE losses on the standardized targets and optimized with Lion at peak learning rate1×10−41\\times 10^\{\-4\}, weight decay0\.050\.05, gradient\-norm clipping at1\.01\.0, and a linear warm\-up over5%5\\%of training followed by cosine decay to1×10−61\\times 10^\{\-6\}\. We train for 250 epochs in float16 mixed precision with an effective batch size of one simulation per step and select the final checkpoint on the validation loss\.

### C\.3Flow Matching

The third method replaces the deterministic regression objective with a flow\-matching generative model on the same multi\-block grid representation\. We adopt the linear stochastic\-interpolant formulation\[[35](https://arxiv.org/html/2606.13742#bib.bib56),[32](https://arxiv.org/html/2606.13742#bib.bib55)\]\. For a clean samplex0∈ℝB×NB×H×W×Cx\_\{0\}\\in\\mathbb\{R\}^\{B\\times N\_\{B\}\\times H\\times W\\times C\}we collect all eight target fields \(velocity, density, pressure, enthalpy, total pressure, kinetic energy, temperature, Mach number\) and Gaussian noiseε∼𝒩\(0,I\)\\varepsilon\\sim\\mathcal\{N\}\(0,I\)\. Then we sample a timestept∼𝒰\[0,1\]t\\sim\\mathcal\{U\}\[0,1\]uniformly per sample and form the straight\-line interpolantzt=t,x0\+\(1−t\)εz\_\{t\}=t,x\_\{0\}\+\(1\-t\)\\varepsilon\. Given a conditioning vectorcc, the network is parameterized to predict the clean statex0pred=fθ\(zt,t,c\)x\_\{0\}^\{\\text\{pred\}\}=f\_\{\\theta\}\(z\_\{t\},t,c\)rather than the velocity or the noise, and we train under the analytic velocity\-matching lossℒ=1\|Ω\|∑Ω\|\(x0−zt\)/\(1−t\)−\(x0pred−zt\)/\(1−t\)\|2\\mathcal\{L\}=\\frac\{1\}\{\|\\Omega\|\}\\sum\_\{\\Omega\}\\big\|\(x\_\{0\}\-z\_\{t\}\)/\(1\-t\)\-\(x\_\{0\}^\{\\text\{pred\}\}\-z\_\{t\}\)/\(1\-t\)\\big\|^\{2\}\. Training is done only on fluid cellsΩ\\Omegadefined by the AMR fluid mask and with\(1−t\)\(1\-t\)clamped from below bytε=5×10−2t\_\{\\varepsilon\}=5\\times 10^\{\-2\}for numerical safety neart=1t=1\. The corruption is i\.i\.d\. standard Gaussian across cells, blocks and channels\.

The denoiser shares the same backbone as the deterministic ViT baseline \(d=384d=384, 8 transformer blocks, 3 heads, MLP ratio 4, RMSNorm \+ SwiGLU \+ AdaLN\-Zero, 2D RoPE on average\-pooled patch centres\) and differs only in the input and conditioning pipelines\. The noisy fieldztz\_\{t\}is patchified by a two\-stage bottleneck patch embedding with stridep=16p=16projecting the eight target channels to a PCA\-like bottleneck of width 64, followed by a1×11\\times 1convolution back to widthd=384d=384\. The scalar timestep is mapped through a sinusoidal embedding of width 256 and a SiLU MLP into a vector of widthdd\. The 16\-dimensional design vector \(15 geometry parameters concatenated with the inflow Mach number\) is independently embedded by a SiLU MLP of widthdd, and the two are summed to form the AdaLN\-Zero modulation token fed into every transformer block\. The final layer predictsp2⋅Coutp^\{2\}\\cdot C\_\{\\text\{out\}\}patch channels which are unpatchified back to the original cell grid\.

We train for 250 epochs in bfloat16 mixed precision with an effective batch size of one simulation per step, using Lion at peak learning rate3×10−63\\times 10^\{\-6\}, weight decay0\.050\.05, gradient\-norm clipping at1\.01\.0, a5%5\\%linear warm\-up and a cosine decay to1×10−61\\times 10^\{\-6\}\. At inference we drawxT∼𝒩\(0,I\)x\_\{T\}\\sim\\mathcal\{N\}\(0,I\)and integrate the learned probability flow ODEdx/dt=\(x0pred\(xt,t\)−xt\)/\(1−t\)\\mathrm\{d\}x/\\mathrm\{d\}t=\(x\_\{0\}^\{\\text\{pred\}\}\(x\_\{t\},t\)\-x\_\{t\}\)/\(1\-t\)fromt=0t=0tot=1t=1with explicit Euler forN−1N\-1steps and a closed\-form last stepx←αx\+\(1−α\)x0predx\\leftarrow\\alpha x\+\(1\-\\alpha\)x\_\{0\}^\{\\text\{pred\}\}withα=\(1−tnext\)/\(1−tcur\)\\alpha=\(1\-t\_\{\\text\{next\}\}\)/\(1\-t\_\{\\text\{cur\}\}\)that lands the trajectory exactly on the model’s terminalx0x\_\{0\}estimate\. In practice we always sweep over a variety of integration steps and always report the one resulting in the best average relativeL2L\_\{2\}error across all field predictions on the validation set\.

### C\.4Physics\-aware Refinement

This appendix details how the differentiable JAX\-Fluids solver\[[6](https://arxiv.org/html/2606.13742#bib.bib12),[7](https://arxiv.org/html/2606.13742#bib.bib11)\]is used for physics\-aware refinement of neural emulators\. We note that the solver, its discretization, the multi\-block mesh, and the level\-set\-based immersed boundary method are described in[Section˜3\.1](https://arxiv.org/html/2606.13742#S3.SS1)\.

##### Residual loss\.

A steady\-state solution of the compressible Euler equations, denoted by𝐔∗\\mathbf\{U\}^\{\*\}, satisfies

𝐑\(𝐔∗\)=∂𝐅\(𝐔∗\)∂x\+∂𝐆\(𝐔∗\)∂y=0\\mathbf\{R\}\\left\(\\mathbf\{U\}^\{\*\}\\right\)=\\frac\{\\partial\\mathbf\{F\}\(\\mathbf\{U\}^\{\*\}\)\}\{\\partial x\}\+\\frac\{\\partial\\mathbf\{G\}\(\\mathbf\{U\}^\{\*\}\)\}\{\\partial y\}=0where the convective flux vectors𝐅\\mathbf\{F\}and𝐆\\mathbf\{G\}are defined in[Appendix˜A](https://arxiv.org/html/2606.13742#A1)\(see[Equation˜8](https://arxiv.org/html/2606.13742#A1.E8)\)\. Thus, at steady state, the divergence of the convective fluxes vanishes\. For a flow field𝐔\\mathbf\{U\}that does not satisfy the steady\-state Euler equations exactly,

𝐑\(𝐔\)≠0\.\\mathbf\{R\}\\left\(\\mathbf\{U\}\\right\)\\neq 0\.

##### Point\-wise residual calculation\.

In general, predictions of the physics emulator do not exactly satisfy the governing equations\. We exploit this property to define a physics\-based loss for fine\-tuning\. Given a surrogate prediction, we evaluate the discrete PDE residual using JAX\-Fluids and minimize it by backpropagating through both JAX\-Fluids and the physics emulator\.

The discrete residual vector in cell\(i,j\)\(i,j\)is given by

𝐑i,j=𝐅i\+12,j−𝐅i−12,jΔx\+𝐆i,j\+12−𝐆i,j−12Δy,\\mathbf\{R\}\_\{i,j\}\\;=\\;\\frac\{\\mathbf\{F\}\_\{i\+\\frac\{1\}\{2\},j\}\-\\mathbf\{F\}\_\{i\-\\frac\{1\}\{2\},j\}\}\{\\Delta x\}\\;\+\\;\\frac\{\\mathbf\{G\}\_\{i,j\+\\frac\{1\}\{2\}\}\-\\mathbf\{G\}\_\{i,j\-\\frac\{1\}\{2\}\}\}\{\\Delta y\},\(14\)where the cell\-face fluxes𝐅i±12,j\\mathbf\{F\}\_\{i\\pm\\frac\{1\}\{2\},j\}and𝐆i,j±12\\mathbf\{G\}\_\{i,j\\pm\\frac\{1\}\{2\}\}are computed with the same fifth\-order WENO\-Z reconstruction and HLLC Riemann solver used during data generation, ensuring consistency of the residual operator with the discretization used for generating training data\. We note that cut cells contribute through their volume fraction and face apertures as outlined in[Appendix˜A](https://arxiv.org/html/2606.13742#A1)\.

The point\-wise residuals are aggregated into a scalar physics lossℒPDE\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}by squaring each residual component, applying a per\-equation weightwkw\_\{k\}, and summing over all fluid cells,

ℒPDE=∑\(i,j\)∈Ω∑k=14wk\(Ri,jk\)2ΔxΔy,\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}=\\sum\_\{\(i,j\)\\in\\Omega\}\\sum\_\{k=1\}^\{4\}w\_\{k\}\\left\(R^\{k\}\_\{i,j\}\\right\)^\{2\}\\Delta x\\Delta y,\(15\)whereRi,jkR^\{k\}\_\{i,j\}denotes thekk\-th component of𝐑i,j\\mathbf\{R\}\_\{i,j\}, corresponding to mass,xx\-momentum,yy\-momentum, and total energy\. We found that theyy\-momentum had higher normalized residuals than the other conserved quantities and usewk=\(1,1,0\.1,1\)w\_\{k\}=\(1,\\,1,\\,0\.1,\\,1\), which down\-weights its balance to ensure an equal representation of all components\.

##### Differentiable coupling\.

Since the reference data were generated with JAX\-Fluids, the refinement residual must be evaluated with the same solver: only then is the discrete residual consistent with the numerical discretization underlying the training data, so that minimizing it drives the prediction toward the solver solution rather than toward the fixed point of some other discretization\. The surrogate, however, trains in PyTorch, so its predictions and the resulting gradients must be passed between PyTorch and the JAX\-based solver\. We bridge the two with zero\-copy DLPack sharing wrapped in a custom autograd function, giving exact gradients consistent with the discretization\. Floating\-point inputs are upcast to double precision for the solver and the gradients are downcast back to the surrogate’s working precision\.

##### Refinement objective\.

Every refinement run starts from the same pre\-trained ViT checkpoint and optimises a weighted sum of up to three terms,

ℒ=ℒdata\+wdivℒdiv\+λℒPDE,\\mathcal\{L\}=\\mathcal\{L\}\_\{\\mathrm\{data\}\}\+w\_\{\\mathrm\{div\}\}\\,\\mathcal\{L\}\_\{\\mathrm\{div\}\}\+\\lambda\\,\\mathcal\{L\}\_\{\\mathrm\{PDE\}\},namely a supervised data\-reconstruction lossℒdata\\mathcal\{L\}\_\{\\mathrm\{data\}\}against the reference fields, a*divergence*lossℒdiv\\mathcal\{L\}\_\{\\mathrm\{div\}\}that penalises the mean\-squared deviation of the prediction from a frozen copy of the pre\-trained model, and the physics lossℒPDE\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}\.ℒPDE\\mathcal\{L\}\_\{\\mathrm\{PDE\}\}andℒdiv\\mathcal\{L\}\_\{\\mathrm\{div\}\}are target\-free and refinement can therefore be applied to new samples without running a full simulation for each, only the mesh is needed\.

A key advantage of the physics and divergence terms is that neither requires ground\-truth fields\. Both are evaluated from the model’s own prediction and the mesh, so we can refine the surrogate on additional samples for which a simulation was never run and only a mesh and its design parameters are available\. For these samples the supervised data term cannot be formed and is dropped\.

The physics loss alone is not sufficient and leads to degenerate solutions\. The divergence term prevents this by penalising deviation from the frozen pre\-trained model, acting as a target\-free replacement for the reconstruction loss, while the physics loss drives the field toward conservation consistency\. This extends the surrogate to new regions of the design space at the cost of meshing rather than a full steady\-state simulation\.

Fine\-tuning uses Lion \(learning rate10−610^\{\-6\}, weight decay0\.050\.05, gradient\-norm clipping1\.01\.0\) with a linear\-warmup cosine\-decay schedule in mixed \(fp16\) precision\. The variants we evaluate, which differ only in the active loss terms and the training data, are summarised in[Table˜7](https://arxiv.org/html/2606.13742#A3.T7)\.

Table 7:Refinement variants\. All start from the same pre\-trained ViT and share the optimiser and physics\-loss settings above, differing only in the active loss terms and the training data \(D1D\_\{1\}simulation data,D2D\_\{2\}simulation\-free; see[Table˜2](https://arxiv.org/html/2606.13742#S4.T2)\)\. A dash denotes an inactive term\.∗\\astdenotes fine\-tuning on cases without existing field data, hence no data loss can be computed\.

## Appendix DScaling Experiments

##### Nested OOD splits with farthest\-point sampling\.

For data\-scaling studies we additionally require training subsets of sizesn1<n2<⋯<nK\{n\_\{1\}<n\_\{2\}<\\dots<n\_\{K\}\}that \(i\) share a single fixed \(val, test, OOD\) evaluation set and \(ii\) are strictly nested:𝒯n1⊂𝒯n2⊂⋯⊂𝒯nK\\mathcal\{T\}\{n\_\{1\}\}\\subset\\mathcal\{T\}\{n\_\{2\}\}\\subset\\dots\\subset\\mathcal\{T\}\{n\_\{K\}\}\. After fixing the evaluation splits as above, we apply greedy farthest\-point sampling \(FPS\) on the training pool\. Parameters are min–max normalized to\[0,1\]d\[0,1\]^\{d\}so that all dimensions contribute equally to Euclidean distance\. FPS is initialized at the point closest to the pool centroid \(to avoid boundary bias\) and at every step appendsarg⁡maxi⁡min⁡j∈𝒮∥𝐩i−𝐩j∥2\\arg\\max\_\{i\}\\min\{j\\in\\mathcal\{S\}\}\\lVert\\mathbf\{p\}\_\{i\}\-\\mathbf\{p\}\_\{j\}\\rVert\_\{2\}to the selected set𝒮\\mathcal\{S\}\. Because each iteration grows𝒮\\mathcal\{S\}by exactly one point, every prefix of lengthnkn\_\{k\}is a validkk\-center cover of the training pool, yielding the required nesting property\. The training subset of sizenkn\_\{k\}used in our scaling experiments is the firstnkn\_\{k\}points of this FPS ordering\. Validation, test, and OOD sets are identical across allnkn\_\{k\}, isolating the effect of training\-set size from variation in the evaluation distribution\. Every split is verified to \(i\) contain no duplicate indices within any subset, \(ii\) have empty pairwise intersection across subsets, and \(iii\) cover the full set of discovered run indices\.

##### Hyperparameter searches\.

The scaling curves we report compare best\-tuned configurations at every coordinate of the \(model size, dataset size\) grid rather than evaluating a single fixed recipe, to avoid systematically favoring any one capacity\. For both the deterministic AB\-UPT and ViT surrogates we ran a dedicated learning\-rate sweep at every \(model size, dataset size\) cell of the scaling grid, bracketing the value reported in the implementation\-details section by one order of magnitude on either side, and selected the run minimizing the in\-distribution validation loss for inclusion in the reported curves\. For the flow\-matching ViT we additionally ran a grid over integration stepsNint∈\{1,2,3,5,7,9,10,20,50,70\}N\_\{\\text\{int\}\}\\in\\\{1,2,3,5,7,9,10,20,50,70\\\}, and report the one that performs best on the validation set\.

## Appendix EPredictive Uncertainty

We provide additional results for the correlation between predictive uncertainty and error across the three different methods on the remaining field predictions\. In[Figure˜13](https://arxiv.org/html/2606.13742#A5.F13)we show correlation on the temperature, Mach number, kinetic energy, and enthalpy fields\. Flow matching consistently attains the highest correlation and coefficient of determination compared to AB\-UPT and ViT\. On the remaining density and velocity fields \(see[Figure˜14](https://arxiv.org/html/2606.13742#A5.F14)\), the picture slightly changes\. Specifically, on those quantities AB\-UPT exhibits the highest correlation between predictive error and uncertainty\. We believe the reason for this is that density and velocity contain sharp localized structures such as shocks and wall\-aligned viscuous gradients, whereas the remaining fields are nonlinear algebraic combinations of those primitives and therefore smoother\. AB\-UPT is less affected by these artifacts because it learns a position\-wise mapping from coordinates to field values and can therefore disentangle predictions\. ViT on the other hand uses patching with an initial average pooling layer such that the average coordinate per field is mapped to field values\. This design choice is deliberate to save computational complexity as smaller patch sizes yield longer token sequences that amplify complexity of the quadratic self\-attention\. Therefore, we believe that these results are artifacts from design choices and do not reflect a direct disadvantage of regular\-grid\-based methods\.

![Refer to caption](https://arxiv.org/html/2606.13742v1/x20.png)aTemperature, flow matching
![Refer to caption](https://arxiv.org/html/2606.13742v1/x21.png)bTemperature, ViT
![Refer to caption](https://arxiv.org/html/2606.13742v1/x22.png)cTemperature, AB\-UPT
![Refer to caption](https://arxiv.org/html/2606.13742v1/x23.png)dMach number, flow matching
![Refer to caption](https://arxiv.org/html/2606.13742v1/x24.png)eMach number, ViT
![Refer to caption](https://arxiv.org/html/2606.13742v1/x25.png)fMach number, AB\-UPT
![Refer to caption](https://arxiv.org/html/2606.13742v1/x26.png)gKinetic energy, flow matching
![Refer to caption](https://arxiv.org/html/2606.13742v1/x27.png)hKinetic energy, ViT
![Refer to caption](https://arxiv.org/html/2606.13742v1/x28.png)iKinetic energy, AB\-UPT
![Refer to caption](https://arxiv.org/html/2606.13742v1/x29.png)jEnthalpy, flow matching
![Refer to caption](https://arxiv.org/html/2606.13742v1/x30.png)kEnthalpy, ViT
![Refer to caption](https://arxiv.org/html/2606.13742v1/x31.png)lEnthalpy, AB\-UPT

Figure 13:Correlation of predictive uncertainty and error of emulator predictions for derived fields\.Rows correspond to temperature \(a–c\), Mach number \(d–f\), kinetic energy \(g–i\) and enthalpy \(j–l\)\. Columns correspond, left to right, to flow matching, ViT and AB\-UPT\. All five fields are nonlinear algebraic combinations of the primitive variables and exhibit a correlation ranking that is consistent across the three surrogates with flow matching exhibiting best correlation between predictive error and uncertainty\.![Refer to caption](https://arxiv.org/html/2606.13742v1/x32.png)aDensity, flow matching
![Refer to caption](https://arxiv.org/html/2606.13742v1/x33.png)bDensity, ViT
![Refer to caption](https://arxiv.org/html/2606.13742v1/x34.png)cDensity, AB\-UPT
![Refer to caption](https://arxiv.org/html/2606.13742v1/x35.png)dStreamwise Velocity,
flow matching
![Refer to caption](https://arxiv.org/html/2606.13742v1/x36.png)eStreamwise Velocity,
ViT
![Refer to caption](https://arxiv.org/html/2606.13742v1/x37.png)fStreamwise Velocity,
AB\-UPT

Figure 14:Correlation of predictive uncertainty and error for density and velocity predictions for the three emulators\.Rows correspond to density \(a–c\), and streamwise velocity \(d–f\), columns correspond, left to right, to flow\-matching, ViT and AB\-UPT\. AB\-UPT exhibits the best correlation and coefficient of determination between per\-sample RMSE and per\-sample average standard deviation as its inductive bias allows capturing position\-wise shock and boundary layer artifacts\.
## Appendix FThermodynamic Self\-consistency of Derived Quantities

We train the physics emulators to predict a redundant set of flow primitives aside from pressurepp, densityρ\\rho, and velocity𝐮\\mathbf\{u\}, which are thermodynamically independent\. The remaining fields follow in closed form from these primitives through their respective relations\. In particular, the static temperatureT=p/\(ρR\)T=p/\(\\rho R\), the specific static enthalpyh=γγ−1p/ρh=\\tfrac\{\\gamma\}\{\\gamma\-1\}p/\\rho, the Mach numberMa=\|𝐯\|/γp/ρMa=\|\\mathbf\{v\}\|/\\sqrt\{\\gamma p/\\rho\}, the total pressurept=p\(1\+γ−12Ma2\)γ/\(γ−1\)p\_\{t\}=p\(1\+\\tfrac\{\\gamma\-1\}\{2\}Ma^\{2\}\)^\{\\gamma/\(\\gamma\-1\)\}and the kinetic\-energy densityKE=12ρ\|𝐯\|2\\mathrm\{KE\}=\\tfrac\{1\}\{2\}\\rho\|\\mathbf\{v\}\|^\{2\}, withγ=1\.4\\gamma=1\.4andR=287\.05Jkg−1K−1R=287\.05~\\mathrm\{J\\,kg^\{\-1\}\\,K^\{\-1\}\}\.

We evaluate all emulators on the different splits\. For every case the model’s full inference path yields the predicted primitives on all fluid cells, with the flow\-matching estimate taken as the ensemble mean over 10 independent stochastic samples\. For each derived quantity we report two errors against the ground truth, both as the relativeL2L\_\{2\}norm taken over the fluid cells, namely the error of model’s*predicted*field, and the one of the*derived*field, obtained by applying the relation above to the model’s own predictedpp,ρ\\rho, and𝐮\\mathbf\{u\}\. Each entry of[Table˜8](https://arxiv.org/html/2606.13742#A6.T8)lists predicted,/,,/,derived \(%\)\. The agreement between both measures the thermodynamic self\-consistency of an emulators predictions, while their difference quantifies whether emitting a redundant channel directly is preferable to deriving it from primitives\.

Three patterns emerge\. First, the temperature and enthalpy columns are identical, becausehhandTTdiffer only by the constant factorcp=γR/\(γ−1\)c\_\{p\}=\\gamma R/\(\\gamma\-1\)\. As relativeL2L\_\{2\}error is scale invariant the errors for both are consequently equal\. Second, direct prediction is consistently more accurate, oftentimes substantially, as it sidesteps error accumulation through a nonlinear transform\. The gap is mild for temperature, enthalpy and Mach number and negligible for the kinetic energy, indicating that the emulators’ base predictions are very nearly thermodynamically self\-consistent\. However, the gap is dramatic for the total pressure\. SincePtP\_\{t\}depends onMa\\operatorname\{Ma\}through the steep isentropic power law, at the supersonic Mach numbers small velocity and Mach errors are amplified super\-exponentially, inflating the derived total\-pressure error to∼104%\{\\sim\}10^\{4\}\\%for the ViT and∼1010%\{\\sim\}10^\{10\}\\%for flow matching, while the corresponding direct heads remain well behaved at2\.52\.5–3\.9%3\.9\\%\. Finally, the ranking across architectures is preserved on every split and all three degrade only modestly on the OOD split without changing order\. Together these results show that the different emulators are thermodynamically coherent and argue for directly predicting derived quantities whose reconstruction from the base primitives is nonlinear, most critically the total pressure, but also the Mach number\.

Table 8:Prediction vs deriving different physical quantities after pre\-training\.Relative L2 errors \(%\) for derived quantities enthalpyhh, total pressureptp\_\{t\}, kinetic energykk, temperatureTT, and Mach numberMa\\operatorname\{Ma\}for different models for predicting / deriving the different quantities\. Predicting derived quantities directly generally results in smaller error\.Finally, we also report the difference between predicting and deriving the different quantities after physics\-aware model refinement in[Table˜9](https://arxiv.org/html/2606.13742#A6.T9)\. Here we compare ViT before and after physics\-aware model refinement\. Physics\-aware refinement results in marginal improvements if the different field quantities are predicted directly\. However there is a significant reduction in error after refinement if the different quantities are derived\. This difference is particularly pronounced in derived fields that exhibit a nonlinear relation to primitive, like total pressure and Mach number\. For the former we sometimes even observe an improvement of around an order of magnitude\. This finding indicates that physics\-aware model refinement facilitates thermodynamic consistency of the PE\.

Table 9:Prediction vs deriving different physical quantities after physics\-aware refinement\.Relative L2 errors \(%\) for derived quantities enthalpyhh, total pressureptp\_\{t\}, kinetic energykk, temperatureTT, and Mach numberMa\\operatorname\{Ma\}for ViT before and after model refinement for predicting / deriving the different quantities\.
A fully GPU-based workflow for building physics emulators of hypersonic flows

Similar Articles

Physics-informed convolutional neural networks for fluid flow through porous media

AeroJEPA: Learning Semantic Latent Representations for Scalable 3D Aerodynamic Field Modeling

Fluid Simulation for Dummies

DiffAero: A GPU-Accelerated Differentiable Simulation Framework for Efficient Quadrotor Policy Learning

Two-Parameter Flows for Learning Population Dynamics of Physical Systems

Submit Feedback

Similar Articles

Physics-informed convolutional neural networks for fluid flow through porous media
AeroJEPA: Learning Semantic Latent Representations for Scalable 3D Aerodynamic Field Modeling
DiffAero: A GPU-Accelerated Differentiable Simulation Framework for Efficient Quadrotor Policy Learning
Two-Parameter Flows for Learning Population Dynamics of Physical Systems