AeroJEPA: Learning Semantic Latent Representations for Scalable 3D Aerodynamic Field Modeling

arXiv cs.LG Papers

Summary

This paper introduces AeroJEPA, a Joint-Embedding Predictive Architecture for scalable 3D aerodynamic field modeling. It addresses limitations in current surrogate models by predicting semantic latent representations of flow fields, enabling efficient high-fidelity analysis and design optimization.

arXiv:2605.05586v1 Announce Type: new Abstract: Aerodynamic surrogate models are increasingly used to replace repeated high-fidelity CFD evaluations in many-query design settings, but current approaches still face two important limitations: they often scale poorly to the very large fields arising in realistic 3D aerodynamics, and they rarely produce latent representations that are directly useful for analysis and design. We introduce AeroJEPA, a Joint-Embedding Predictive Architecture for aerodynamic field modeling that addresses both issues. Rather than predicting the full flow field directly from geometry, AeroJEPA predicts a target latent representation of the flow from a context latent representation of the geometry and operating conditions, and optionally reconstructs the field through a continuous implicit decoder. This formulation decouples latent prediction from field resolution while encouraging the latent space to organize semantically. We evaluate AeroJEPA on two complementary datasets: HiLiftAeroML, which stresses the method in a high-fidelity regime with extremely large boundary-layer fields, and SuperWing, which tests large-scale generalization and latent-space optimization over a broad family of transonic wings. Across these benchmarks, AeroJEPA is competitive as a continuous surrogate for aerodynamic fields, scales naturally to high-resolution outputs, and learns context and predicted latents that encode geometry and aerodynamic quantities not used directly as supervision. We further show that the resulting latent space supports controlled interpolation, linear probing, concept-vector arithmetic, and a constrained design latent-optimization experiment. These results suggest that predictive latent learning is a promising direction for scalable and design-meaningful aerodynamic surrogate modeling.
Original Article
View Cached Full Text

Cached at: 05/08/26, 08:03 AM

# AeroJEPA: Learning Semantic Latent Representations for Scalable 3D Aerodynamic Field Modeling
Source: [https://arxiv.org/html/2605.05586](https://arxiv.org/html/2605.05586)
Francisco Giral Universidad Politécnica de Madrid &Abhijeet Vishwasrao University of Michigan Andrea Arroyo Ramo Universitat Politècnica de València &Mahmoud Golestanian Purdue University &Federica Tonti University of Michigan Adrian Lozano\-Durán Caltech &Steven L\. Brunton University of Washington &Sergio Hoyas Universitat Politècnica de València Hector Gomez Purdue University &Soledad Le Clainche Universidad Politécnica de Madrid &Ricardo Vinuesa11footnotemark:1 University of Michigan

###### Abstract

Aerodynamic surrogate models are increasingly used to replace repeated high\-fidelity CFD evaluations in many\-query design settings, but current approaches still face two important limitations: they often scale poorly to the very large fields arising in realistic 3D aerodynamics, and they rarely produce latent representations that are directly useful for analysis and design\. We introduceAeroJEPA, a Joint\-Embedding Predictive Architecture for aerodynamic field modeling that addresses both issues\. Rather than predicting the full flow field directly from geometry, AeroJEPA predicts a target latent representation of the flow from a context latent representation of the geometry and operating conditions, and optionally reconstructs the field through a continuous implicit decoder\. This formulation decouples latent prediction from field resolution while encouraging the latent space to organize semantically\. We evaluate AeroJEPA on two complementary datasets: HiLiftAeroML, which stresses the method in a high\-fidelity regime with extremely large boundary\-layer fields, and SuperWing, which tests large\-scale generalization and latent\-space optimization over a broad family of transonic wings\. Across these benchmarks, AeroJEPA is competitive as a continuous surrogate for aerodynamic fields, scales naturally to high\-resolution outputs, and learns context and predicted latents that encode geometry and aerodynamic quantities not used directly as supervision\. We further show that the resulting latent space supports controlled interpolation, linear probing, concept\-vector arithmetic, and a constrained design latent\-optimization experiment\. These results suggest that predictive latent learning is a promising direction for scalable and design\-meaningful aerodynamic surrogate modeling\.

## 1Introduction

Aerodynamic design increasingly operates in a many\-query regime: engineers must repeatedly evaluate high\-dimensional flow fields over large spaces of geometries and operating conditions in order to screen concepts, compare design variants, and refine promising candidates\. High\-fidelity CFD remains the gold standard for these tasks, but its cost makes direct optimization, large\-scale exploration, and rapid iteration prohibitively expensive\. This tension has long motivated surrogate modeling for aerodynamic analysis\(Forresteret al\.,[2006](https://arxiv.org/html/2605.05586#bib.bib19),[2008](https://arxiv.org/html/2605.05586#bib.bib18); Yondoet al\.,[2018](https://arxiv.org/html/2605.05586#bib.bib43)\), and more recently has driven the development of neural surrogates, neural operators, and continuous field representations for aerodynamic prediction\(Azizzadenesheliet al\.,[2024](https://arxiv.org/html/2605.05586#bib.bib4); Catalaniet al\.,[2024](https://arxiv.org/html/2605.05586#bib.bib9); Duvall and Duraisamy,[2025](https://arxiv.org/html/2605.05586#bib.bib15)\)\.

Despite this progress, two limitations remain central in realistic 3D aerodynamics\. First, many existing surrogates are optimized for direct field prediction on a fixed discretization, which makes them difficult to scale to the very large outputs that arise in high\-fidelity settings\. Second, even when they are accurate, they often provide little structure in the representation space itself: they predict fields, but do not yield latent variables that are clearly aligned with geometry, operating conditions, underlying physics, or downstream aerodynamic performance\. Yet this structure is precisely what would make a learned surrogate more useful for scientific understanding and design\(Vishwasraoet al\.,[2026](https://arxiv.org/html/2605.05586#bib.bib50); Vinuesaet al\.,[2026](https://arxiv.org/html/2605.05586#bib.bib51)\)\. A semantically organized latent space could support probing, interpolation, and even gradient\-based optimization without repeatedly manipulating meshes or decoding full fields at every inner\-loop step\.

![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/JEPA_neurips.png)Figure 1:Overview of the AeroJEPA framework\. The context encoder maps the geometry point cloud to context tokensZcZ\_\{c\}, while the target encoder maps the ground\-truth flow point cloud to target tokensZtZ\_\{t\}\. The predictor conditionsZcZ\_\{c\}on the operating variablescc\(e\.g\.,α\\alpha,R​eRe, Mach\) and outputs predicted target tokensZ^t\\hat\{Z\}\_\{t\}\. Training always includes a latent matching lossℒlat\\mathcal\{L\}\_\{\\mathrm\{lat\}\}and a collapse\-avoidance regularizerℒsig\\mathcal\{L\}\_\{\\mathrm\{sig\}\}\. When a decoder is included, the predicted tokens are also decoded to the physical field and masked\-supervised with a reconstruction lossℒrec\\mathcal\{L\}\_\{\\mathrm\{rec\}\}\. At inference, the target encoder is discarded; the decoder is used only if full\-field reconstruction is required\.This challenge sits at the intersection of two active lines of work\. On the aerodynamic side, recent methods have shown the promise of compact latent representations, transformer\-based surrogates, and implicit decoders for high\-dimensional CFD data\(Solera\-Ricoet al\.,[2024](https://arxiv.org/html/2605.05586#bib.bib30); Francés\-Beldaet al\.,[2024](https://arxiv.org/html/2605.05586#bib.bib29); Eivaziet al\.,[2022](https://arxiv.org/html/2605.05586#bib.bib52); Choyet al\.,[2025](https://arxiv.org/html/2605.05586#bib.bib39); Wuet al\.,[2024a](https://arxiv.org/html/2605.05586#bib.bib37); Adamset al\.,[2025](https://arxiv.org/html/2605.05586#bib.bib36); Zouet al\.,[2026](https://arxiv.org/html/2605.05586#bib.bib38)\)\. These approaches demonstrate the value of compressed and continuous representations, but they are still typically trained through direct geometry\-to\-field prediction or reconstruction\. On the representation\-learning side, Joint\-Embedding Predictive Architectures \(JEPAs\) learn by predicting target embeddings from context embeddings rather than reconstructing raw inputs\(Assranet al\.,[2023](https://arxiv.org/html/2605.05586#bib.bib2); Sobalet al\.,[2022](https://arxiv.org/html/2605.05586#bib.bib28)\)\. Recent developments such as LeJEPA and LeWorldModel show that JEPA\-style training can be stabilized through explicit latent regularization, and related work has extended the paradigm to 3D, multimodal, and scientific settings\(Balestriero and LeCun,[2025](https://arxiv.org/html/2605.05586#bib.bib5); Maeset al\.,[2026](https://arxiv.org/html/2605.05586#bib.bib6); Huet al\.,[2024](https://arxiv.org/html/2605.05586#bib.bib33); Pereraet al\.,[2025](https://arxiv.org/html/2605.05586#bib.bib31); Chenet al\.,[2025](https://arxiv.org/html/2605.05586#bib.bib32); Quet al\.,[2026](https://arxiv.org/html/2605.05586#bib.bib35); Yee and Koh,[2026](https://arxiv.org/html/2605.05586#bib.bib34)\)\. However, this line of work has not yet been fully developed for aerodynamic surrogate modeling, where the context is geometry plus operating condition and the target is the latent representation of a full flow field\.

In this paper, we introduceAeroJEPA, a JEPA\-style predictive latent architecture tailored to 3D aerodynamic problems\. Rather than learning to reconstruct the flow field directly from geometry, AeroJEPA predicts a target latent representation of the flow from a context latent representation of the geometry and the operating conditions\. A continuous implicit decoder can then reconstruct the field at arbitrary query locations from the predicted latent state\. This design decouples the expensive prediction problem from the spatial resolution of the output field, while simultaneously encouraging the model to organize information in a latent space that can be analyzed and exploited beyond pure reconstruction accuracy\.

We evaluate AeroJEPA on two complementary datasets\. HiLiftAeroML stresses the method in a high\-fidelity regime with extremely large boundary\-layer fields on realistic high\-lift aircraft geometries, where scalability and continuous decoding are essential\. SuperWing instead emphasizes broad generalization over a large parametric family of transonic wings and allows us to test whether the learned latent space can support a proof\-of\-concept optimization workflow\. Across these datasets, we show that AeroJEPA is competitive as a continuous surrogate for aerodynamic fields, that its context and predicted latents capture geometry and aerodynamic information not used directly as supervision, and that the resulting latent space is smooth enough to support controlled interpolation and constrained design\-space search\.

Our contributions are threefold\. First, we formulate a JEPA\-style predictive latent architecture for aerodynamic surrogate modeling, combining geometry\-conditioned latent prediction with a continuous Implicit Neural Representation \(INR\) decoder\. Second, we show that this formulation yields semantically meaningful latent spaces in which design variables and aerodynamic quantities can be recovered or manipulated despite not being used as direct training targets\. Third, we demonstrate on large\-scale aerodynamic benchmarks that the approach is practically useful: it remains competitive for field prediction, scales naturally to high\-resolution outputs, and enables a lightweight proof\-of\-concept optimization procedure in latent space\.

## 2Method

In this section, we present the formulation of AeroJEPA, a novel framework adapted from the Joint\-Embedding Predictive Architecture \(JEPA\) paradigm tailored specifically for 3D aerodynamic problems\. Unlike traditional surrogate models that attempt to map physical geometries directly to high\-dimensional flow fields \(or standard autoencoders that focus on pixel\- or voxel\-level reconstruction\), AeroJEPA operates entirely within a learned, physically meaninful latent space\. This approach enables rapid prediction of flow fields and explicitly structures the latent dimensions to be highly correlated with geometry design variables and physical properties, permitting extremely fast optimization and design cycles\.

### 2\.1Problem formulation

We consider the problem of predicting steady\-state 3D fluid fields around complex aerodynamic geometries\. Let a given geometric design be defined by its boundary surface∂Ω\\partial\\Omega, which interacts with a fluid over a spatial domainΩ⊂ℝ3\\Omega\\subset\\mathbb\{R\}^\{3\}\. The geometry is discretized and represented as an unstructured point cloud of boundary condition \(BC\) points,𝒫=\{xi\}i=1Nc\\mathcal\{P\}=\\\{x\_\{i\}\\\}\_\{i=1\}^\{N\_\{c\}\}, where eachxi∈∂Ωx\_\{i\}\\in\\partial\\Omega\.

The flow physics are governed by a set of operating or freestream conditions, denoted ascc\. In the context of aerodynamics, this conditioning vector includes variables such as the Reynolds number \(R​eRe\) and the angle of attack \(α\\alpha\)\.

The objective is to map the geometric representation𝒫\\mathcal\{P\}and the physical conditionsccto the corresponding continuous fluid field \(e\.g\., pressure and velocity vector fields\), denoted asℱ\\mathcal\{F\}\. Instead of predicting a heavily discretized grid, we aim to learn an INR that can output the fluid state at any arbitrary spatial query pointq=\(x,y,z\)∈Ωq=\(x,y,z\)\\in\\Omega\.

### 2\.2The AeroJEPA framework

The core philosophy of AeroJEPA is to separate geometry encoding, latent prediction, and field decoding through a compact tokenized interface\. Figure[1](https://arxiv.org/html/2605.05586#S1.F1)highlights which modules are active in each regime\. The context encoder and predictor are always used at inference\. The target encoder is used only during training\. The decoder is optional: it can be optimized jointly with the latent JEPA objective, or trained later on frozen predicted latents in a decoupled second stage\.

This architecture supports two training workflows\. In the*coupled*workflow, the context encoder, target encoder, predictor, and decoder are trained end\-to\-end with both latent and reconstruction masked\-supervision\. In the*decoupled*workflow, the latent JEPA is trained first using only the encoders, predictor, and latent\-space losses; afterwards, the encoders and predictor are frozen and a decoder is trained separately on the predicted latents\. Both variants are compatible with the same framework\. In this work we use the coupled variant because, empirically, it better preserves the physical validity of predicted latents without sacrificing semantic alignment\.

At inference time, target encoder is discarded\. For latent analysis, or latent\-optimization, the model uses the context encoder and predictor alone\. For field prediction, the predicted tokens are additionally passed through the INR decoder to reconstruct the fluid field at arbitrary query points\.

### 2\.3Encoders: transforming physical space to latent space

To handle the highly irregular and often massive point sets arising in 3D aerodynamics, AeroJEPA first converts both geometry and flow data into manageable point\-cloud inputs and then compresses them into fixed\-size token sets\.

#### Context Encoder

The context encoder,ℰc\\mathcal\{E\}\_\{c\}, starts from the original CFD mesh but discards mesh connectivity, retaining only a point\-cloud representation of the geometry\. For training efficiency, each case \(which may originally contain tens of millions of points\) is randomly subsampled to a manageable number of points using farthest point sampling \(FPS\), typically on the order of8×1038\\times 10^\{3\}–131×103131\\times 10^\{3\}points depending on the dataset resolution\. For surface\-only settings, the encoder uses point coordinates as input features\. For volumetric settings, we additionally provide the signed\-distance function \(SDF\) value at each point\. Other geometry\-derived attributes such as surface normals could also be incorporated, but were not used during the development of this work\.

Given the subsampled geometry cloud𝒫\\mathcal\{P\}, the encoder produces a fixed number of context tokens,

Zc=ℰc​\(𝒫\)∈ℝM×dZ\_\{c\}=\\mathcal\{E\}\_\{c\}\(\\mathcal\{P\}\)\\in\\mathbb\{R\}^\{M\\times d\}\(1\)whereMMis the number of spatial tokens andddis the token width\. In the main experiments, HiLiftAeroML uses30723072tokens of width6464, while SuperWing uses512512tokens of width128128\.

#### Target Encoder

Used only during training, the target encoder,ℰt\\mathcal\{E\}\_\{t\}, operates on an independently subsampled point cloud of the ground\-truth fluid field\. This target cloud may cover only surface quantities \(e\.g\.,CpC\_\{p\},CfC\_\{f\}, boundary\-layer velocity\) or a volumetric field \(e\.g\., velocity and pressure throughout the domain\), depending on the dataset\. Importantly, the target points are sampled independently from the geometry points, so the model does not rely on one\-to\-one correspondence between context and target samples\.

The target encoder maps the subsampled flow fieldℱ\\mathcal\{F\}to target tokens,

Zt=ℰt​\(ℱ\)∈ℝM×dZ\_\{t\}=\\mathcal\{E\}\_\{t\}\(\\mathcal\{F\}\)\\in\\mathbb\{R\}^\{M\\times d\}\(2\)ZtZ\_\{t\}serves as the target in latent space\. In both encoders, token embeddings are obtained by clustering the subsampled point cloud into learned centroids, aggregating local neighborhoods around each centroid with a lightweight message\-passing network\(Gilmeret al\.,[2017](https://arxiv.org/html/2605.05586#bib.bib48)\), and refining the resulting token set with a point\-transformer backbone\. This turns irregular point clouds of varying size into fixed\-size token sets that can be matched directly\.

### 2\.4Latent predictor network

The predictive core of AeroJEPA is the predictor network,fθpredf\_\{\\theta\}^\{\\text\{pred\}\}, parameterized byθ\\theta\. Instead of predicting the high\-dimensional field directly, the predictor operates entirely in tokenized latent space\. Given the context tokens extracted from the geometry,ZcZ\_\{c\}, and the physical operating conditions,cc, the network predicts the target flow tokens:

Z^t=fθpred​\(Zc,c\)\\hat\{Z\}\_\{t\}=f\_\{\\theta\}^\{\\text\{pred\}\}\(Z\_\{c\},c\)\(3\)wherecccontains the relevant flow variables for the problem at hand, such as angle of attack, Reynolds number, or Mach number\. The conditions are injected inside the predictor so that a single geometry encoding can support multiple operating points\. The predictor therefore learns the map from geometry\-conditioned tokens to flow\-conditioned tokens, which is far cheaper than direct dense field prediction while preserving a structured latent interface for downstream probing and optimization\.

### 2\.5Implicit neural representation \(INR\) decoder

To map the fluid latent states back into the physical domain, we employ an INR Decoder,fϕdecf\_\{\\phi\}^\{\\text\{dec\}\}, parameterized byϕ\\phi\. Unlike voxel grids or fixed meshes, INRs provide a continuous, resolution\-independent mapping by querying exact spatial coordinates\.

The decoder takes as input a spatial query pointq=\(x,y,z\)∈Ωq=\(x,y,z\)\\in\\Omegaalongside the predicted latentsZ^t\\hat\{Z\}\_\{t\}\. When volumetric information is used, the decoder also receives the local SDF value at the query point\. During training, the reconstruction query points are also subsampled from the original field and are chosen independently from both the context and target\-encoder point sets\. This preserves the decoupled setting and effectively trains the model in a masked fashion, encouraging the latent tokens to encode the underlying continuous field rather than a fixed subset of sampled locations\. It outputs the local fluid state variables, such as velocity vector𝐮\\mathbf\{u\}and pressurepp:

\[𝐮​\(q\),p​\(q\)\]=fϕdec​\(Z^t,q\)\[\\mathbf\{u\}\(q\),p\(q\)\]=f\_\{\\phi\}^\{\\text\{dec\}\}\(\\hat\{Z\}\_\{t\},q\)\(4\)By conditioning a multi\-layer perceptron \(MLP\) on the latent vectorZ^t\\hat\{Z\}\_\{t\}, the INR Decoder acts as a continuous basis function for the aerodynamic field\. This architecture is particularly advantageous for complex aircraft geometries, as it completely bypasses the need for volumetric meshing and allows for arbitrary physical resolution purely bounded by the query points provided\.

### 2\.6Training objectives

A central challenge in JEPA training is avoiding representation collapse while still learning a predictive latent space that is useful for downstream tasks\. AeroJEPA follows recent JEPA formulations that replace EMA teachers and stop\-gradient heuristics with an explicit regularizer on the latent distribution, namely SIGReg\(Balestriero and LeCun,[2025](https://arxiv.org/html/2605.05586#bib.bib5); Maeset al\.,[2026](https://arxiv.org/html/2605.05586#bib.bib6)\)\. This yields two closely related training objectives depending on whether the decoder is optimized jointly or separately\.

For coupled end\-to\-end training, we optimize

ℒtotal=λℓ​ℒlat\+λr​ℒrec\+λs​ℒsig,\\mathcal\{L\}\_\{\\mathrm\{total\}\}=\\lambda\_\{\\ell\}\\,\\mathcal\{L\}\_\{\\mathrm\{lat\}\}\+\\lambda\_\{r\}\\,\\mathcal\{L\}\_\{\\mathrm\{rec\}\}\+\\lambda\_\{s\}\\,\\mathcal\{L\}\_\{\\mathrm\{sig\}\},\(5\)whereas the decoupled latent\-only stage uses

ℒlatent​\-​only=λℓ​ℒlat\+λs​ℒsig\.\\mathcal\{L\}\_\{\\mathrm\{latent\\mbox\{\-\}only\}\}=\\lambda\_\{\\ell\}\\,\\mathcal\{L\}\_\{\\mathrm\{lat\}\}\+\\lambda\_\{s\}\\,\\mathcal\{L\}\_\{\\mathrm\{sig\}\}\.\(6\)The latent matching term is

ℒlat=‖Z^t−Zt‖22,\\mathcal\{L\}\_\{\\mathrm\{lat\}\}=\\left\\\|\\hat\{Z\}\_\{t\}\-Z\_\{t\}\\right\\\|\_\{2\}^\{2\},\(7\)which aligns the predictor output with the target encoder output token by token\. When the decoder is trained jointly, the reconstruction term is

ℒrec=𝔼𝐪∈Ω​\[‖fϕdec​\(Z^t,q\)−ℱ​\(q\)‖22\],\\mathcal\{L\}\_\{\\mathrm\{rec\}\}=\\mathbb\{E\}\_\{\\mathbf\{q\}\\in\\Omega\}\\left\[\\left\\\|f\_\{\\phi\}^\{\\mathrm\{dec\}\}\(\\hat\{Z\}\_\{t\},q\)\-\\mathcal\{F\}\(q\)\\right\\\|\_\{2\}^\{2\}\\right\],\(8\)with the supervised channels depending on whether the task is surface\-only or volumetric\. The regularization termℒsig\\mathcal\{L\}\_\{\\mathrm\{sig\}\}applies SIGReg to the latent tokens in order to keep the embedding distribution well spread and predictive without relying on EMA or stop\-gradient training heuristics\. SIGReg regularizes the latent distribution through random low\-dimensional projections toward an isotropic Gaussian prior\.

In all experiments of this paper we use the coupled training objective\. Empirically, we found that including the reconstruction pathway during AeroJEPA training helps maintain physical validity in the predicted latents while preserving their semantic alignment with design and aerodynamic variables\. The loss weights are set toλℓ=1\.0\\lambda\_\{\\ell\}=1\.0,λr=1\.0\\lambda\_\{r\}=1\.0, andλs=0\.01\\lambda\_\{s\}=0\.01\. These values were chosen empirically: assigning comparable weight to the latent\-matching and reconstruction terms, while using a lighter regularization weight, gave the best trade\-off between field accuracy and a latent space that remained interpretable and useful for probing and optimization\.

## 3Computational experiments

Our experiments are designed to answer three questions: \(i\) whether AeroJEPA yields accurate full\-field surrogates on challenging 3D aerodynamic benchmarks, \(ii\) whether the learned latent space is semantically aligned with geometry and flow variables, and \(iii\) whether that latent structure could be useful for downstream design optimization\. We therefore evaluate the method from three complementary perspectives: surrogate accuracy, latent\-space structure, and optimization utility across two datasets that stress different aerodynamic regimes\.

### 3\.1Datasets

We evaluate AeroJEPA on two complementary aerodynamic datasets\. The first is the HiLiftAeroML dataset introduced by Ashton et al\.\(Ashtonet al\.,[2026](https://arxiv.org/html/2605.05586#bib.bib45)\), which provides high\-fidelity CFD data generated with solution\-adapted WMLES for high\-lift configurations\. This dataset is particularly relevant for assessing whether the proposed latent\-space prediction strategy can capture complex separated\-flow phenomena on realistic aircraft geometries\. The second is SuperWing\(Yanget al\.,[2025](https://arxiv.org/html/2605.05586#bib.bib46)\), a large\-scale transonic swept\-wing dataset designed for data\-driven aerodynamic design\. SuperWing contains diverse parameterized wing geometries simulated over a broad range of operating conditions, making it suitable for evaluating generalization across geometry and flow\-condition variations\. Additional information on the geometry families, flow regimes, and the role of each dataset in our evaluation protocol is provided in Appendix[D](https://arxiv.org/html/2605.05586#A4)\.

### 3\.2HiLiftAeroML: latent structure and full\-field surrogate accuracy

#### Surrogate comparison\.

HiLiftAeroML is the most demanding benchmark in our study because it combines realistic high\-lift aircraft geometries with high\-resolution boundary\-layer fields defined on approximately∼15\\sim 15M surface points, and∼50\\sim 50M volume points\. This regime is especially challenging for baseline architectures that are designed to predict all target points simultaneously\. In practice, direct full\-field prediction at this resolution is not feasible for these models, so we train them on chunks of131×103131\\times 10^\{3\}target points and reconstruct the full field by repeated chunk\-wise inference at test time\. This degrades both accuracy and efficiency\. AeroJEPA avoids this bottleneck: the subsampled geometry is encoded only once, the predictor produces a latent representation of the full continuous field, and the INR decoder is then queried at arbitrary locations\. In this work, we evaluate the velocity and pressure fields on the boundary\-layer surface, where the flow exhibits the strongest variation and the prediction task is most demanding\. However, the same framework can be applied to the full volumetric domain without modification\. As shown in Table[1](https://arxiv.org/html/2605.05586#S3.T1), this design yields the best accuracy across all field components while also requiring the lowest inference cost among the compared models\.

Table 1:Performance metrics and inference computational cost on HiLiftAeroML, reported as mean±\\pmstandard deviation across test cases\. Baseline models are trained on chunks of131×103131\\times 10^\{3\}target points and evaluated over the full field by chunk\-wise inference, whereas AeroJEPA predicts a continuous latent field and decodes it at arbitrary query locations\.Beyond aggregate errors, the qualitative reconstructions confirm that the model captures the relevant high\-lift flow structures on unseen cases\. Figure[2](https://arxiv.org/html/2605.05586#S3.F2)shows the decoded velocity field for test geometry LHC013 at18∘18^\{\\circ\}angle of attack\. The prediction remains consistent with the reference CFD solution in the most sensitive surface regions, including areas affected by strong gradients and flow separation\. This qualitative agreement is important because the benchmark evaluates the full resolved boundary\-layer field rather than a reduced set of probes or integrated outputs; additional pressure views are provided in Appendix[F\.1](https://arxiv.org/html/2605.05586#A6.SS1)\.

![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/case13_18aoa_velocity_view3.png)Figure 2:HiLiftAeroML reconstruction for test geometry LHC013 at18∘18^\{\\circ\}angle of attack\. AeroJEPA captures the main boundary\-layer velocity patterns and high\-gradient regions on an unseen configuration while decoding the field continuously over the full aircraft surface\. The relative error reported in the figure is1\.18%1\.18\\%in MAE / max\-GT\-mean for the velocity\-magnitude field\. Additional views are provided in Appendix[F\.1](https://arxiv.org/html/2605.05586#A6.SS1)\.
#### Latent analysis\.

We next examine whether the learned latent spaces encode physically meaningful structure\. Figure[3](https://arxiv.org/html/2605.05586#S3.F3)compares a PCA projection of the context latents from AeroJEPA against a VAE baseline\. This baseline preserves a context branch and mixes context and target information through cross\-attention before decoding to keep the same information paths as our AeroJEPA, but the training objective is different\. AeroJEPA organizes the geometry manifold more coherently, suggesting that the predictive objective induces a more structured representation of the design space\. This qualitative separation is supported quantitatively in Appendix[F\.1](https://arxiv.org/html/2605.05586#A6.SS1): ridge probes recover the four dominant high\-lift control\-surface deflections from the context latent withR2R^\{2\}values between0\.9650\.965and0\.9880\.988, even though these parameters are never used during training\.

The predicted latents after the predictor also encode nontrivial aerodynamic information, despite the fact that the model is trained only on the primitive fields\(u,v,w,p\)\(u,v,w,p\)and never on integrated coefficients such asCLC\_\{L\}orCDC\_\{D\}\. As shown in Figure[3](https://arxiv.org/html/2605.05586#S3.F3), the predicted\-latent manifold varies smoothly with aerodynamic quantities, and the corresponding linear probes recoverCLC\_\{L\}andCDC\_\{D\}withR2=0\.930R^\{2\}=0\.930and0\.9960\.996, respectively\. This indicates that the predictive objective induces a latent representation aligned not only with geometry, but also with downstream aerodynamic performance\.

We also find that the context latent is interpretable beyond linear recovery\. The concept\-vector arithmetic analysis in Appendix[F\.4](https://arxiv.org/html/2605.05586#A6.SS4)shows that the main flap and slat directions act as nearly disentangled latent controls: flap\-slat cross\-talk is weak, while the remaining couplings occur primarily between inboard and outboard elements of the same surface, which is physically consistent with how these controls co\-vary in the dataset\. Taken together, these experiments show that on HiLiftAeroML, AeroJEPA is not only a more accurate and scalable surrogate for very large fields, but also a representation learner whose latent variables remain useful for analysis, probing, and controlled traversal\.

![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/fig1b_context_pca_geometry_jepa_vs_ae_context.png)

![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/fig4_predicted_pca.png)

Figure 3:Main HiLift latent\-space results\. Top: PCA projection of the context latents comparing AeroJEPA \(Top\) with the VAE baseline \(Bottom\), showing that AeroJEPA organizes the geometry manifold more coherently\. Bottom: projection of the predicted latents after conditioning on the flow state, showing that the predicted manifold arranges smoothly with aerodynamic quantities even though the model is trained only on the primitive fields\(u,v,w,p\)\(u,v,w,p\)\.Additional quantitative probe recoveries, interpolation results, and qualitative reconstructions are provided in Appendix[F\.1](https://arxiv.org/html/2605.05586#A6.SS1)\.

### 3\.3SuperWing: large\-scale generalization and latent\-space optimization

We use SuperWing to evaluate a complementary capability: broad generalization across a large family of transonic swept wings and practical usefulness for aerodynamic design\. Unlike HiLiftAeroML, where the main difficulty is the extreme size of the target field, SuperWing allows us to study both surrogate accuracy and downstream optimization on a dataset with fixed\-resolution surface fields of32×10332\\times 10^\{3\}points\. This setting is useful because it lets us compare AeroJEPA against strong baselines under two inference regimes: a chunked mode that mimics the realistic deployment setting used in HiLiftAeroML, and a one\-pass mode in which all target points are predicted at once\.

Table[2](https://arxiv.org/html/2605.05586#S3.T2)summarizes this comparison\. For the baselines, the one\-pass setting is favorable because the full SuperWing field fits into a single forward pass, whereas the chunked setting better reflects the many\-query scenario in which predictions must be assembled from local batches of8×1038\\times 10^\{3\}points, matching the training resolution\. AeroJEPA is invariant to this distinction because the geometry is encoded once and the field is decoded continuously from the predicted latent representation\. As a result, its chunked and one\-pass results coincide\. The benchmark shows two complementary effects\. First, AeroJEPA clearly outperforms the chunked baseline evaluations acrossCf,τC\_\{f,\\tau\},Cf,zC\_\{f,z\}, andCpC\_\{p\}, while remaining computationally efficient\. Second, when the baselines are granted the more favorable one\-pass regime, Transolver and GeoTransolver become very competitive numerically, which is expected in a dataset whose full target size is still manageable in memory\. We therefore view SuperWing not as a case where AeroJEPA dominates every setting, but as evidence that the latent\-field formulation remains competitive while retaining capabilities that pointwise surrogates do not naturally provide\.

Table 2:Performance metrics and inference computational cost on SuperWing, reported as mean±\\pmstandard deviation across test cases \(TFLOPs are deterministic per setting since every case shares the same point count\), comparing chunked and one\-pass baseline decoding methods\. The chunked setting uses batches of8×1038\\times 10^\{3\}points, matching the training resolution, whereas one\-pass inference predicts the full32×10332\\times 10^\{3\}\-point field at once\. AeroJEPA yields the same result in both regimes because the geometry is encoded once and the field is decoded continuously from the latent representation\.Additional decoded\-field comparisons and force parity plots for SuperWing are reported in Appendix[F\.2](https://arxiv.org/html/2605.05586#A6.SS2)\.

Finally, SuperWing lets us test whether the semantically structured latent space is useful for design\-space search\. We optimize aerodynamic efficiencyCL/CDC\_\{L\}/C\_\{D\}at a fixed cruise condition by searching directly in the128128\-dimensional context latent space, using a differentiable chain formed by the frozen AeroJEPA predictor and linear probes forCLC\_\{L\}andCDC\_\{D\}\. The optimization is constrained by a latent trust region, bounds on the subset of design parameters that are reliably recoverable from the context latents, aerodynamic floor and ceiling constraints, and a ceiling on achievableL/DL/Dderived from the dataset envelope\. After optimization, the latent optimum is mapped back to design space through the design probe, and the closest real wing in the dataset is retrieved\. Figure[4](https://arxiv.org/html/2605.05586#S3.F4)summarizes this procedure and the associated aerodynamic envelope\.

The purpose of this experiment is not to claim that AeroJEPA can solve arbitrary wing optimization problems without further validation\. Rather, it shows that the learned latent space is smooth enough to support gradient\-based search within a self\-consistent trust region, and that this search can be carried out without repeatedly decoding fields or modifying CAD representations during the optimization loop\. In this sense, SuperWing provides a proof of concept that semantically meaningful latent spaces can serve as practical surrogates for rapid preliminary design exploration\.

![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/fig2_aero_envelope.png)

![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/fig3_design_recipe.png)

Figure 4:Proof\-of\-concept latent\-space optimization on SuperWing\.*Left:*in the\(CD,CL\)\(C\_\{D\},C\_\{L\}\)scatter of the full dataset, the surrogate optimum sits at the upper\-left corner of the achievable envelope, on the high\-efficiency frontier traced by the iso\-L/DL/Dcontours\. The accompanying histogram shows the optimum landing at the right tail of the datasetL/DL/Ddistribution, well separated from the initial geometry while remaining inside the calibrated regime\.*Right:*parallel\-coordinates view of the nine reliably encoded design parameters, showing that the optimized design corresponds to a recognizable high\-efficiency wing recipe—large reference area and aspect ratio, high sweep, aggressive taper, and root\-biased washout twist\. The retrieved nearest\-neighbour wing follows the same polyline up to small offsets, indicating that the optimum corresponds to a wing family already present in the dataset\.Taken together, the SuperWing results show that AeroJEPA remains competitive as a surrogate in a broad transonic wing\-design regime while also enabling a lightweight, differentiable optimization workflow in latent space\. This complements the HiLiftAeroML results: the latter emphasize scalability and semantic structure under extreme field sizes, whereas SuperWing highlights design\-space usability under a realistic aerodynamic optimization setting\.

## 4Conclusions

AeroJEPA introduces a JEPA\-style predictive latent formulation for 3D aerodynamic surrogate modeling, combining geometry\-conditioned latent prediction with an optional continuous INR decoder\. The central idea is to move the expensive part of the learning problem from dense field prediction to structured latent prediction, while still retaining the ability to decode the flow continuously at arbitrary query locations when needed\.

Across HiLiftAeroML and SuperWing, the experiments show three main outcomes\. First, AeroJEPA remains competitive as a surrogate model and is especially advantageous in regimes where the target field is too large for direct one\-shot prediction\. Second, the learned context and predicted latents are semantically meaningful: they linearly recover design and aerodynamic quantities not used as direct supervision, exhibit smooth organization under interpolation, and expose interpretable concept directions\. Third, this latent structure is not merely descriptive; it is usable for downstream tasks, as shown by the proof\-of\-concept latent\-space optimization on SuperWing\.

These results suggest that predictive latent learning is a promising direction for aerodynamic surrogate modeling: it offers a path toward surrogates that are not only scalable and accurate, but also analyzable and useful for design\-space exploration\. At the same time, the present study is limited to two datasets and primarily evaluates steady settings; the optimization and latent analyses are also proof\-of\-concept rather than a full design pipeline\. Future work will therefore study broader flow regimes, tighter integration with inverse design and shape parameterization pipelines, and extensions to unsteady or multi\-fidelity aerodynamic settings; additional discussion appears in App\.[A](https://arxiv.org/html/2605.05586#A1)\.

## References

- C\. Adams, R\. Ranade, R\. Cherukuri, and S\. Choudhry \(2025\)GeoTransolver: learning physics on irregular domains using multi\-scale geometry aware physics attention transformer\.arXiv preprint arXiv:2512\.20399\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1)\.
- S\. Akbari, P\. H\. Dabaghian, and O\. San \(2023\)Blending machine learning and sequential data assimilation over latent spaces for surrogate modeling of boussinesq systems\.Physica D: Nonlinear Phenomena448,pp\. 133711\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px1.p1.1)\.
- N\. Ashton, A\. M\. Clark, C\. Ivey, L\. Heidt, S\. Bose, R\. Ranade, R\. Agrawal, and K\. Goc \(2026\)High\-fidelity cfd data generation for hiliftaeroml using solution\-adapted wmles\.InAIAA SCITECH 2026 Forum,pp\. 0042\.Cited by:[§D\.1](https://arxiv.org/html/2605.05586#A4.SS1.p1.3),[§3\.1](https://arxiv.org/html/2605.05586#S3.SS1.p1.1)\.
- M\. Assran, Q\. Duval, I\. Misra, P\. Bojanowski, P\. Vincent, M\. Rabbat, Y\. LeCun, and N\. Ballas \(2023\)Self\-supervised learning from images with a joint\-embedding predictive architecture\.InProceedings of the IEEE/CVF conference on computer vision and pattern recognition,pp\. 15619–15629\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1)\.
- H\. Aubeelack, S\. Segonds, C\. Bes, T\. Druot, J\. Brezillon, A\. Bérard, M\. Duffau, and G\. Gallant \(2023\)Surrogate model development for optimized blended\-wing\-body aerodynamics\.Journal of Aircraft60\(2\),pp\. 437–448\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px1.p1.1)\.
- K\. Azizzadenesheli, N\. Kovachki, Z\. Li, M\. Liu\-Schiaffini, J\. Kossaifi, and A\. Anandkumar \(2024\)Neural operators for accelerating scientific simulations and design\.Nature Reviews Physics6\(5\),pp\. 320–328\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p1.1)\.
- R\. Balestriero and Y\. LeCun \(2025\)Lejepa: provable and scalable self\-supervised learning without the heuristics\.arXiv preprint arXiv:2511\.08544\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1),[§2\.6](https://arxiv.org/html/2605.05586#S2.SS6.p1.1)\.
- J\. Bardhan, R\. Agrawal, A\. Tilak, C\. Neeraj, and S\. Mitra \(2025\)HEP\-jepa: a foundation model for collider physics using joint embedding predictive architecture\.arXiv preprint arXiv:2502\.03933\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px2.p1.1)\.
- G\. D\. Bird, S\. E\. Gorrell, and J\. L\. Salmon \(2021\)Dimensionality\-reduction\-based surrogate models for real\-time design space exploration of a jet engine compressor blade\.Aerospace Science and Technology118,pp\. 107077\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px1.p1.1)\.
- G\. Catalani, S\. Agarwal, X\. Bertrand, F\. Tost, M\. Bauerheim, and J\. Morlier \(2024\)Neural fields for rapid aircraft aerodynamics simulations\.Scientific Reports14\(1\),pp\. 25496\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p1.1)\.
- D\. Chen, M\. Shukor, T\. Moutakanni, W\. Chung, J\. Yu, T\. Kasarla, Y\. Bang, A\. Bolourchi, Y\. LeCun, and P\. Fung \(2025\)Vl\-jepa: joint embedding predictive architecture for vision\-language\.arXiv preprint arXiv:2512\.10942\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1)\.
- S\. Cheng, J\. Chen, C\. Anastasiou, P\. Angeli, O\. K\. Matar, Y\. Guo, C\. C\. Pain, and R\. Arcucci \(2023\)Generalised latent assimilation in heterogeneous reduced spaces with machine learning surrogate models\.Journal of Scientific Computing94\(1\),pp\. 11\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px1.p1.1)\.
- C\. Choy, A\. Kamenev, J\. Kossaifi, M\. Rietmann, J\. Kautz, and K\. Azizzadenesheli \(2025\)Factorized implicit global convolution for automotive computational fluid dynamics prediction\.arXiv preprint arXiv:2502\.04317\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1)\.
- V\. Chu, O\. Mashaal, and H\. Abou\-Zeid \(2026\)WirelessJEPA: a multi\-antenna foundation model using spatio\-temporal wireless latent predictions\.arXiv preprint arXiv:2601\.20190\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px2.p1.1)\.
- M\. Destrade, O\. Bounou, Q\. L\. Lidec, J\. Ponce, and Y\. LeCun \(2025\)Value\-guided action planning with jepa world models\.arXiv preprint arXiv:2601\.00844\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px2.p1.1)\.
- K\. Drozdov, R\. Shwartz\-Ziv, and Y\. LeCun \(2024\)Video representation learning with joint\-embedding predictive architectures\.arXiv preprint arXiv:2412\.10925\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px2.p1.1)\.
- X\. Du, P\. He, and J\. R\. Martins \(2021\)Rapid airfoil design optimization via neural networks\-based parameterization and surrogate modeling\.Aerospace Science and Technology113,pp\. 106701\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px1.p1.1)\.
- J\. Duvall and K\. Duraisamy \(2025\)Discretization\-independent surrogate modeling of physical fields around variable geometries using coordinate\-based networks\.Data\-Centric Engineering6,pp\. e5\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p1.1)\.
- H\. Eivazi, S\. Le Clainche, S\. Hoyas, and R\. Vinuesa \(2022\)Towards extraction of orthogonal and parsimonious non\-linear modes from turbulent flows\.Expert Systems with Applications202,pp\. 117038\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1)\.
- A\. ElSheikh, R\. Wang, W\. Wu, Y\. Wen, P\. Dibaeinia, J\. Y\. Zhang, J\. Y\. Hu, M\. Knudson, S\. Babu, S\. Sun,et al\.\(2026\)Cell\-jepa: latent representation learning for single\-cell transcriptomics\.arXiv preprint arXiv:2602\.02093\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px2.p1.1)\.
- Z\. Fei, M\. Fan, and J\. Huang \(2023\)A\-jepa: joint\-embedding predictive architecture can listen\.arXiv preprint arXiv:2311\.15830\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px2.p1.1)\.
- A\. I\. Forrester, N\. W\. Bressloff, and A\. J\. Keane \(2006\)Optimization using surrogate models and partially converged computational fluid dynamics simulations\.Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences462\(2071\),pp\. 2177–2204\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p1.1)\.
- A\. Forrester, A\. Sobester, and A\. Keane \(2008\)Engineering design via surrogate modelling: a practical guide\.John Wiley & Sons\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p1.1)\.
- V\. Francés\-Belda, A\. Solera\-Rico, J\. Nieto\-Centenero, E\. Andrés, C\. Sanmiguel Vila, and R\. Castellanos \(2024\)Toward aerodynamic surrogate modeling based onβ\\beta\-variational autoencoders\.Physics of Fluids36\(11\)\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1)\.
- Q\. Garrido, M\. Assran, N\. Ballas, A\. Bardes, L\. Najman, and Y\. LeCun \(2024\)Learning and leveraging world models in visual representation learning\.arXiv preprint arXiv:2403\.00504\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px2.p1.1)\.
- Y\. Ghazi, N\. Alhazmi, R\. Tezaur, and C\. Farhat \(2022\)Training a neural\-network\-based surrogate model for aerodynamic optimisation using a gaussian process\.International Journal of Computational Fluid Dynamics36\(7\),pp\. 538–554\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px1.p1.1)\.
- J\. Gilmer, S\. S\. Schoenholz, P\. F\. Riley, O\. Vinyals, and G\. E\. Dahl \(2017\)Neural message passing for quantum chemistry\.InInternational conference on machine learning,pp\. 1263–1272\.Cited by:[§2\.3](https://arxiv.org/html/2605.05586#S2.SS3.SSS0.Px2.p2.2)\.
- C\. Hu, S\. Martin, and R\. Dingreville \(2022\)Accelerating phase\-field predictions via recurrent neural networks learning the microstructure evolution in latent space\.Computer Methods in Applied Mechanics and Engineering397,pp\. 115128\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px1.p1.1)\.
- N\. Hu, H\. Cheng, Y\. Xie, S\. Li, and J\. Zhu \(2024\)3d\-jepa: a joint embedding predictive architecture for 3d self\-supervised representation learning\.arXiv preprint arXiv:2409\.15803\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1)\.
- S\. Katel, H\. Li, Z\. Zhao, R\. Kansal, F\. Mokhtar, and J\. Duarte \(2024\)Learning symmetry\-independent jet representations via jet\-based joint embedding predictive architecture\.arXiv preprint arXiv:2412\.05333\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px2.p1.1)\.
- K\. Kontolati, S\. Goswami, G\. Em Karniadakis, and M\. D\. Shields \(2024\)Learning nonlinear operators in latent spaces for real\-time predictions of complex dynamics in physical systems\.Nature Communications15\(1\),pp\. 5101\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px1.p1.1)\.
- L\. Maes, Q\. L\. Lidec, D\. Scieur, Y\. LeCun, and R\. Balestriero \(2026\)Leworldmodel: stable end\-to\-end joint\-embedding predictive architecture from pixels\.arXiv preprint arXiv:2603\.19312\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1),[§2\.6](https://arxiv.org/html/2605.05586#S2.SS6.p1.1)\.
- S\. Mo and S\. Tong \(2024\)Connecting joint\-embedding predictive architecture with contrastive self\-supervised learning\.Advances in neural information processing systems37,pp\. 2348–2377\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px2.p1.1)\.
- S\. Mo and S\. Yun \(2024\)Dmt\-jepa: discriminative masked targets for joint\-embedding predictive architecture\.arXiv preprint arXiv:2405\.17995\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px2.p1.1)\.
- W\. Peebles and S\. Xie \(2023\)Scalable diffusion models with transformers\.InProceedings of the IEEE/CVF international conference on computer vision,pp\. 4195–4205\.Cited by:[§E\.4](https://arxiv.org/html/2605.05586#A5.SS4.p3.3)\.
- A\. Perera, K\. Hewagamage, S\. Nazar, K\. Abeywardana, H\. Gallella, R\. Rodrigo, and M\. Afham \(2025\)CrossJEPA: cross\-modal joint\-embedding predictive architecture for efficient 3d representation learning from 2d images\.arXiv preprint arXiv:2511\.18424\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1)\.
- H\. Qu, R\. Morel, M\. McCabe, A\. Bietti, F\. Lanusse, S\. Ho, and Y\. LeCun \(2026\)Representation learning for spatiotemporal physical systems\.arXiv preprint arXiv:2603\.13227\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1)\.
- G\. Skenderi, H\. Li, J\. Tang, and M\. Cristani \(2023\)Graph\-level representation learning with joint\-embedding predictive architectures\.arXiv preprint arXiv:2309\.16014\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px2.p1.1)\.
- V\. Sobal, J\. SV, S\. Jalagam, N\. Carion, K\. Cho, and Y\. LeCun \(2022\)Joint embedding predictive architectures focus on slow features\.arXiv preprint arXiv:2211\.10831\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1)\.
- A\. Solera\-Rico, C\. Sanmiguel Vila, M\. Gómez\-López, Y\. Wang, A\. Almashjary, S\. T\. Dawson, and R\. Vinuesa \(2024\)β\\beta\-Variational autoencoders and transformers for reduced\-order modelling of fluid flows\.Nature Communications15\(1\),pp\. 1361\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1)\.
- H\. Thimonier, J\. L\. D\. M\. Costa, F\. Popineau, A\. Rimmel, and B\. Doan \(2024\)T\-jepa: augmentation\-free self\-supervised learning for tabular data\.arXiv preprint arXiv:2410\.05016\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px2.p1.1)\.
- J\. Ulmen, G\. Sundaram, and D\. Görges \(2025\)Learning state\-space models of dynamic systems from arbitrary data using joint embedding predictive architectures\.IFAC\-PapersOnLine59\(18\),pp\. 19–24\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px2.p1.1)\.
- S\. Verdenius, A\. Zerio, and R\. L\. Wang \(2024\)Lat\-pfn: a joint embedding predictive architecture for in\-context time\-series forecasting\.arXiv preprint arXiv:2405\.10093\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px2.p1.1)\.
- R\. Vinuesa, S\. L\. Brunton, and G\. Mengaldo \(2026\)Explainable ai: learning from the learners\.arXiv preprint arXiv:2601\.05525\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p2.1)\.
- A\. Vishwasrao, F\. Giral, M\. Golestanian, F\. Tonti, A\. A\. Ramo, A\. Lozano\-Duran, S\. L\. Brunton, S\. Hoyas, S\. L\. Clainche, H\. Gomez,et al\.\(2026\)Agentic exploration of pde spaces using latent foundation models for parameterized simulations\.arXiv preprint arXiv:2604\.09584\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p2.1)\.
- H\. Wu, H\. Luo, H\. Wang, J\. Wang, and M\. Long \(2024a\)Transolver: a fast transformer solver for pdes on general geometries\.arXiv preprint arXiv:2402\.02366\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1)\.
- X\. Wu, L\. Jiang, P\. Wang, Z\. Liu, X\. Liu, Y\. Qiao, W\. Ouyang, T\. He, and H\. Zhao \(2024b\)Point transformer v3: simpler faster stronger\.InProceedings of the IEEE/CVF conference on computer vision and pattern recognition,pp\. 4840–4851\.Cited by:[§E\.3](https://arxiv.org/html/2605.05586#A5.SS3.p1.5)\.
- Y\. Yang, W\. Tang, M\. Liu, N\. Thuerey, Y\. Zhang, and H\. Chen \(2025\)SuperWing: a comprehensive transonic wing dataset for data\-driven aerodynamic design\.arXiv preprint arXiv:2512\.14397\.Cited by:[§D\.2](https://arxiv.org/html/2605.05586#A4.SS2.p1.3),[§3\.1](https://arxiv.org/html/2605.05586#S3.SS1.p1.1)\.
- B\. Yee and P\. Koh \(2026\)PI\-jepa: label\-free surrogate pretraining for coupled multiphysics simulation via operator\-split latent prediction\.arXiv preprint arXiv:2604\.01349\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1)\.
- R\. Yondo, E\. Andrés, and E\. Valero \(2018\)A review on design of experiments and surrogate models in aircraft real\-time and many\-query aerodynamic analyses\.Progress in aerospace sciences96,pp\. 23–61\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p1.1)\.
- W\. Zhang, P\. Xuhao, K\. Jiaqing, and W\. Xu \(2024\)Heterogeneous data\-driven aerodynamic modeling based on physical feature embedding\.Vol\.37,Elsevier\.Cited by:[Appendix C](https://arxiv.org/html/2605.05586#A3.SS0.SSS0.Px1.p1.1)\.
- J\. Zou, W\. Qiu, Z\. Sun, X\. Zhang, Z\. Zhang, and X\. Zhu \(2026\)AdaField: generalizable surface pressure modeling with physics\-informed pre\-training and flow\-conditioned adaptation\.arXiv preprint arXiv:2601\.07139\.Cited by:[§1](https://arxiv.org/html/2605.05586#S1.p3.1)\.

## Appendix

## Appendix ALimitations

Although AeroJEPA is evaluated on two complementary and challenging aerodynamic datasets, the empirical evidence still covers a limited set of geometries, operating conditions, and mostly steady\-flow regimes, so broader claims across other aircraft configurations or unsteady settings should be made with care\. In addition, the latent probing, arithmetic, and optimization results are intended as proof\-of\-concept demonstrations of semantic structure rather than a complete end\-to\-end design workflow, and the practical utility of the approach will depend on further validation under tighter engineering constraints, larger design spaces, and more diverse fidelity levels\.

## Appendix BBroader impacts

This work studies aerodynamic surrogate modeling, so a clear potential positive impact is to reduce the computational cost of repeated CFD evaluations, which may lower the barrier to aerodynamic analysis and enable faster design\-space exploration in research and engineering settings\. More efficient surrogate models may also reduce the energy and time required for iterative screening relative to repeatedly running high\-fidelity simulations\.

Potential negative impacts are more indirect\. Incorrect surrogate predictions could lead to poor downstream design decisions if used outside the validated regime\. We view AeroJEPA as a decision\-support tool rather than a replacement for high\-fidelity validation, and we emphasize that broader deployment should retain domain\-expert oversight and final verification with established simulation or experimental workflows\.

## Appendix COther related works

#### Broader surrogate\-modeling literature\.

Beyond the works emphasized in the main paper, there is a broad literature on surrogate modeling for aerodynamic design, variable\-fidelity optimization, and engineering\-response approximation\[Duet al\.,[2021](https://arxiv.org/html/2605.05586#bib.bib14), Ghaziet al\.,[2022](https://arxiv.org/html/2605.05586#bib.bib21)\]\. There is also extensive work on latent\-state and reduced\-order surrogate modeling for physical systems more broadly, including latent simulation, data assimilation, and design\-space exploration\[Huet al\.,[2022](https://arxiv.org/html/2605.05586#bib.bib22), Chenget al\.,[2023](https://arxiv.org/html/2605.05586#bib.bib10), Akbariet al\.,[2023](https://arxiv.org/html/2605.05586#bib.bib1), Kontolatiet al\.,[2024](https://arxiv.org/html/2605.05586#bib.bib24), Birdet al\.,[2021](https://arxiv.org/html/2605.05586#bib.bib8), Aubeelacket al\.,[2023](https://arxiv.org/html/2605.05586#bib.bib3), Zhanget al\.,[2024](https://arxiv.org/html/2605.05586#bib.bib44)\]\. These papers help situate AeroJEPA within the larger surrogate\-modeling landscape, but they are less central to our contribution than methods that explicitly combine aerodynamic field prediction with learned latent structure\.

#### Broader latent\-design and JEPA literature\.

On the JEPA side, predictive embedding methods have already been explored for audio, graphs, time series, tabular data, planning, and several scientific domains\[Feiet al\.,[2023](https://arxiv.org/html/2605.05586#bib.bib17), Skenderiet al\.,[2023](https://arxiv.org/html/2605.05586#bib.bib27), Verdeniuset al\.,[2024](https://arxiv.org/html/2605.05586#bib.bib42), Thimonieret al\.,[2024](https://arxiv.org/html/2605.05586#bib.bib40), Drozdovet al\.,[2024](https://arxiv.org/html/2605.05586#bib.bib13), Mo and Tong,[2024](https://arxiv.org/html/2605.05586#bib.bib25), Mo and Yun,[2024](https://arxiv.org/html/2605.05586#bib.bib26), Garridoet al\.,[2024](https://arxiv.org/html/2605.05586#bib.bib20), Destradeet al\.,[2025](https://arxiv.org/html/2605.05586#bib.bib12), Bardhanet al\.,[2025](https://arxiv.org/html/2605.05586#bib.bib7), Katelet al\.,[2024](https://arxiv.org/html/2605.05586#bib.bib23), Chuet al\.,[2026](https://arxiv.org/html/2605.05586#bib.bib11), ElSheikhet al\.,[2026](https://arxiv.org/html/2605.05586#bib.bib16), Ulmenet al\.,[2025](https://arxiv.org/html/2605.05586#bib.bib41)\]\. We view these works as evidence that predictive latent learning is broadly useful, while AeroJEPA specializes that paradigm to aerodynamic surrogate modeling with a geometry\-to\-flow latent bridge\.

## Appendix DAdditional dataset details

This section summarizes the two datasets utilized throughout the paper, emphasizing geometry parameterization, observable fluid variables, and the structural point\-cloud representation mandated by the AeroJEPA pipeline\. Furthermore, we outline the exact dataset partition strategies formulated to rigorously evaluate generalizability\.

### D\.1HiLiftAeroML

HiLiftAeroML\[Ashtonet al\.,[2026](https://arxiv.org/html/2605.05586#bib.bib45)\]serves as the high\-fidelity aerodynamic benchmark in our study\. The dataset catalogs realistic high\-lift aircraft configurations simulated via solution\-adapted wall\-modeled large\-eddy simulation \(WMLES\)\. This physical complexity makes it particularly relevant for evaluating continuous latent representations under highly separated, turbulent flow\. Computationally, it represents the most demanding environment from a spatial scale perspective\. In our formulation, the surface\-oriented boundary\-layer representation resolves approximately1212–1515million points per sample, whereas the internal volumetric representation constructs roughly5050million bounding elements per case\.

Geometrically, the configurations are denoted by specific design geometry \(e\.g\., "LHCXX"\) defined internally by 8 continuous design parameters\. Four of these scalars dictate the highest observed flow variation: the inboard and outboard flap deflections alongside the inboard and outboard slat deflections\. For every given geometry configuration, the dataset provides1010discrete operational flow snapshots sweeping across an Angle of Attack \(AoA\) domain bounding from4∘4^\{\\circ\}to22∘22^\{\\circ\}\. Given the extreme computational burden intrinsic to evaluating millions of points across full WMLES traces, we strictly partition the benchmark topologies into205205cases reserved for the training manifold, and5050cases held out for testing\.

### D\.2SuperWing

SuperWing\[Yanget al\.,[2025](https://arxiv.org/html/2605.05586#bib.bib46)\]acts as the large\-scale transonic benchmark designed to evaluate continuous generalization across topological and operational variabilities\. The corpus contains exactly4,2394\{,\}239parameterized wing geometries mapped across28,85628\{,\}856unique Reynolds\-averaged Navier–Stokes \(RANS\) state solutions spanning a broad aerodynamic domain of Mach numbers and freestream angles of attack\. Each simulation is originally distributed on a structured surface discretization of roughly32,00032\{,\}000points\. To ensure architectural parity, we treat these grids purely as unstructured point sets so the AeroJEPA pipeline remains invariant across dataset discretizations\.

Target fluid fields across SuperWing are iteratively mapped as continuous surface vectors, evaluating the local pressure coefficient \(CpC\_\{p\}\) and skin\-friction coefficient \(CfC\_\{f\}\), contrary to volumetric arrays\. Geometrically, each surface is prescribed by5454latent morphological degrees of freedom regulating localized shape, wing twist, and spanwise dihedral\.

Crucially, to rigorously enforce zero\-shot geometric generalizability, the dataset partitioning is structured strictly across the geometrical dimension rather than by random simulation sweeps\. The parameterized wings are isolated into an80%/10%/10%80\\%/10\\%/10\\%split corresponding to training, validation, and testing regimes, respectively\. Because the stratification dictates that all unique operating parameters for a specific held\-out wing topology remain completely unseen during optimization, the evaluation cleanly measures fundamental physical extrapolation over unseen boundary shapes\.

## Appendix EAdditional model details

This appendix provides a comprehensive, step\-by\-step description of the AeroJEPA pipeline, ranging from raw computational fluid dynamics \(CFD\) representation to latent prediction and subsequent continuous field decoding\.

### E\.1Point\-cloud preprocessing and subsampling

Standard unstructured CFD meshes exhibit highly variable node densities\. To standardize the input representation, connectivity operations are discarded, allowing both the physical boundary and the surrounding flow to be formally treated as continuous point sets\. Let the complete dense surface geometry be denoted as𝒫\\mathcal\{P\}, and the comprehensive ground\-truth fluid states over the spatial domainΩ\\Omegaasℱ\\mathcal\{F\}\. For surface\-only fluid problems, the geometry branch processes purely spatial coordinates𝐱∈ℝ3\\mathbf\{x\}\\in\\mathbb\{R\}^\{3\}\. For volumetric domains, this input is strictly augmented with the local Signed\-Distance Function \(SDF\)\.

Because raw CFD point counts routinely exceed computational memory limits, explicit subsampling is enforced per case to construct bounded observation sets\. Utilizing a Farthest Point Sampling \(FPS\) heuristic to ensure globally comprehensive spatial coverage, we extract a reduced context geometry set𝒫c⊂𝒫\\mathcal\{P\}\_\{c\}\\subset\\mathcal\{P\}of fixed cardinality\|𝒫c\|=Nc\|\\mathcal\{P\}\_\{c\}\|=N\_\{c\}\. Concurrently, the target branch ingests an independently subsampled flow domain setℱt⊂ℱ\\mathcal\{F\}\_\{t\}\\subset\\mathcal\{F\}of size\|ℱt\|=Nt\|\\mathcal\{F\}\_\{t\}\|=N\_\{t\}\. Finally, to support the separate decoding phase, an arbitrary set of spatial query locations𝒬⊂Ω\\mathcal\{Q\}\\subset\\Omegaof size\|𝒬\|=Nq\|\\mathcal\{Q\}\|=N\_\{q\}is sampled\.

Crucially, the geometry context𝒫c\\mathcal\{P\}\_\{c\}, the target flow fieldℱt\\mathcal\{F\}\_\{t\}, and the reconstruction query set𝒬\\mathcal\{Q\}are sampled entirely independently from one another\. This triple\-decoupling natively prevents arbitrary one\-to\-one centroid alignments, guaranteeing that𝒫c∩ℱt=∅\\mathcal\{P\}\_\{c\}\\cap\\mathcal\{F\}\_\{t\}=\\emptysetand𝒬∩\(𝒫c∪ℱt\)=∅\\mathcal\{Q\}\\cap\(\\mathcal\{P\}\_\{c\}\\cup\\mathcal\{F\}\_\{t\}\)=\\emptysetin expected practice\. By ensuring the decoder is queried at spatial coordinates unobserved during the encoding phase, this strategy acts as a continuous masked\-supervision mechanism\. Consequently, it strips the framework of trivial spatial shortcuts and forces the latent space to robustly parameterize the underlying global continuous field dynamics\.

### E\.2Tokenization by learned centroids

Following subsampling, the irregular target and context point sets are projected into fixed\-length sequences of latent spatial tokens\. Specifically, the architecture leverages FPS once more over the subsampled sets𝒫c\\mathcal\{P\}\_\{c\}andℱt\\mathcal\{F\}\_\{t\}to designate foundational centroid coordinates\. A localized message\-passing neighborhood aggregation layer compiles point\-wise variables around each centroid into preliminary latent tokens\. This projection efficiently transitions the variable\-scale physical point cloud into a structured, bounded topology compatible with global transformer operations\. Based on the respective domain complexities, the token context sequence lengths \(MM\) and feature dimensions \(dd\) are scaled; the high\-dimensional spatial topologies resident in HiLiftAeroML necessitate sequences ofM=3072M=3072tokens whereind=64d=64, whereas the lower\-resolution surface fields intrinsic to SuperWing are sufficiently parameterized byM=512M=512tokens of dimensionalityd=128d=128\.

### E\.3Encoder backbones

Both the context encoderℰc\\mathcal\{E\}\_\{c\}and the target encoderℰt\\mathcal\{E\}\_\{t\}utilize computational graphs built upon localized Point Transformer attention mechanisms\[Wuet al\.,[2024b](https://arxiv.org/html/2605.05586#bib.bib47)\]\. Crucially, the massive dimensionality reduction separating the rawNcN\_\{c\}andNtN\_\{t\}inputs from the computational bounds is handled entirely by the preceding centroid\-clustering and neighborhood aggregation layers\. Consequently, the encoder backbones do not inherently alter the topological scale; they operate purely as flat processing stacks over the highly compressed,MM\-sized token distributions\. By employing localized self\-attention rather than global quadratic computations, the framework strictly bounds memory consumption\. Through progressively stacked blocks, the encoder incrementally routes physical information across neighboring tokens\. Shallow layers capture highly localized boundary layer interactions and surface curvature semantics, whereas deeper transformer blocks contextualize macro\-aerodynamic coupling and wake propagation over multiple token hops\. The resulting latent interface inherently preserves the geometric topology, formulating an optimizable blueprint for generative design\.

### E\.4Predictor architecture and conditioning

Sharing structural topology with the encoders, the core latent map is represented by a non\-hierarchical predictor network operating seamlessly over the fixed output token resolution\. The objective of the predictor is to synthesize target flow tokens purely from the geometry context tokens, and the driving physical constraints, without requiring spatial pooling operations\.

Architecturally, the predictor instantiates learnable latent queries corresponding to the spatial coordinates of a fixed set of centroids\. These spatial query coordinates are first embedded via high\-frequency Fourier positional encodings\. The network processes these queries through a stacked, flat sequence that tightly intertwines localized point\-transformer self\-attention and token\-based cross\-attention mechanisms\. The self\-attention pathway facilitates continuous spatial refinement and consistency among the target queries themselves, whereas the interleaving cross\-attention layers allow these queries to explicitly fetch aligned geometric features from the upstream context tokens acting as keys and values\.

Furthermore, recognizing that minor perturbations in freestream operating variables \(e\.g\., Angle of Attackα\\alpha, Reynolds NumberR​eRe, Mach NumberM​aMa\) fundamentally bifurcate the boundary layer dynamics independent of structural modification, we inject these physics conditions natively into the network\. A lightweight multi\-layer perceptron processes the scalar variables, projecting them deeply into the hidden transformer layers through adaptive feature modulation\[Peebles and Xie,[2023](https://arxiv.org/html/2605.05586#bib.bib49)\]\. Consequently, the predictor framework robustly maps an invariant geometric parameterization into a variable continuous flow topology, establishing a smooth, differentiable property leveraged subsequently by gradient\-based latent optimization probes\.

### E\.5Implicit neural representation \(INR\) decoder

To remap the predicted token distribution to an arbitrarily resolved continuous space, the target tokensZ^t\\hat\{Z\}\_\{t\}condition the INR decoder\. For any arbitrary element𝐪i∈𝒬\\mathbf\{q\}\_\{i\}\\in\\mathcal\{Q\}, the coordinate is first transformed via Fourier feature encodings to enhance high\-frequency spatial sensitivity before being processed by the decoding Multi\-layer Perceptron \(MLP\)\. In volumetric settings, query elements are strictly augmented by the respective SDF evaluations, explicitly enforcing stark domain demarcations between the interior solid boundary, the near\-wall viscous boundary layer, and the idealized external flow\.

### E\.6Training configurations and hyperparameters

The framework applies a generalized, fully coupled objective trained end\-to\-end iteratively\. During this regime, we enforce equivalent scaling constants on the latent token alignment constraint and the spatial reconstruction field penalty \(λℓ=1\.0\\lambda\_\{\\ell\}=1\.0,λr=1\.0\\lambda\_\{r\}=1\.0\), while scaling down the SIGReg distributional token regularization \(λs=0\.01\\lambda\_\{s\}=0\.01\)\. This specific balancing heuristic proves highly favorable: it prioritizes direct mapping and physical decoding accuracy, while maintaining just enough lightweight regularization to prevent representation collapse without overtaking the core physics gradients\. Network weights are updated using the AdamW optimizer, stabilized by a gradient clip\. All models, including the baselines, were trained on a single NVIDIA H200 GPU\. On HiLiftAeroML, AeroJEPA required roughly4848hours for300300epochs, while the baseline models required roughly9696hours for the same number of epochs\. On SuperWing, both AeroJEPA and the baseline models required roughly2424hours for200200epochs\. Table[3](https://arxiv.org/html/2605.05586#A5.T3)comprehensively documents the fundamental subsampling cardinalities, architectural bounds, optimization profiles, and training durations governing both environments\.

Table 3:Primary architectural parameterization and optimization bounds for the AeroJEPA framework, delineated by the underlying aerodynamic environment\.

## Appendix FAdditional results and experiments

### F\.1Additional HiLiftAeroML results

#### Qualitative reconstruction\.

Figure[5](https://arxiv.org/html/2605.05586#A6.F5)gathers additional qualitative views of the decoded fields for test geometry LHC013 at18∘18^\{\\circ\}angle of attack\. Together with the velocity view retained in the main text, these complementary pressure and velocity plots confirm the same pattern: AeroJEPA preserves the large\-scale structure of the boundary\-layer solution and remains consistent with the reference CFD field in challenging regions\.

#### Latent organization and interpolation\.

Figure[6](https://arxiv.org/html/2605.05586#A6.F6)reports the ridge\-probe recovery of aerodynamic coefficients from the predicted latent, complementing the predicted\-manifold visualization shown in the main text\. Figure[7](https://arxiv.org/html/2605.05586#A6.F7)then examines interpolation in the latent space\. We introduce a scalar interpolation parameterα\\alphaand move between latent states corresponding to different angles of attack and geometry configurations\. The decoded fields remain physically meaningful along the trajectory, and the resulting aerodynamic coefficients stay close to the ground\-truth values, indicating that the learned manifold supports smooth transitions across both operating conditions and design changes\.

#### Decoded\-field consistency\.

We also assess whether the decoded fields yield accurate aerodynamic quantities when forces are estimated directly from the reconstructed surface solution, rather than inferred through a linear probe in latent space\. Figure[8](https://arxiv.org/html/2605.05586#A6.F8)shows the pressure\-coefficient distribution on a wing section for case LHC019 at12∘12^\{\\circ\}angle of attack, comparing the coefficient extracted from the decoded field against the reference CFD profile\. Figure[9](https://arxiv.org/html/2605.05586#A6.F9)further reports parity plots forCLC\_\{L\}andCDC\_\{D\}computed from the decoded predicted fields\. These results are complementary to the latent probing analysis in the main paper: they show that AeroJEPA not only stores aerodynamically meaningful information in latent space, but also recovers integrated force quantities from the actual decoded flow field with good agreement\.

#### Linear recovery and concept arithmetic\.

Figure[10](https://arxiv.org/html/2605.05586#A6.F10)reports the ridge\-regression recovery of design variables from the context latents\. These quantities are never used during training, so this experiment directly tests whether the context encoder captures geometry information in a form that is linearly accessible\. The four dominant control\-surface parameters are all recovered with high fidelity \(R2=0\.965R^\{2\}=0\.965\-0\.9880\.988\), which supports the interpretation of the context manifold in the main text\. Figure[11](https://arxiv.org/html/2605.05586#A6.F11)complements this result through latent arithmetic\. We define concept vectors from the rows of the ridge\-regression weight matrix, traverse the context latent space along those directions, and decode the resulting states back into standardized physical design parameters\. The targeted variable dominates the response along its corresponding trajectory, while the remaining parameters stay comparatively stable\. The residual couplings mainly appear between inboard and outboard flaps and between inboard and outboard slats, which is consistent with the strong geometric dependencies of the high\-lift configuration\.

![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/case13_18aoa_pressure_view2.png)

![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/case13_18aoa_pressure_view1.png)

Figure 5:Additional qualitative pressure views for test geometry LHC013 at18∘18^\{\\circ\}angle of attack\. These complementary views support the velocity\-based qualitative comparison shown in the main text and confirm that AeroJEPA preserves the main surface\-pressure structure on the unseen case\. The relative error reported in the figure is0\.24%0\.24\\%in MAE / max\-GT\-mean for the pressure field\.![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/fig5_probing_scatter.png)Figure 6:Ridge linear probing from predicted latents to aerodynamic coefficients on HiLiftAeroML\. Even though the model is trained only on the primitive fields\(u,v,w,p\)\(u,v,w,p\), the predicted latent supports accurate recovery ofCLC\_\{L\}andCDC\_\{D\}, consistent with the smooth organization of the predicted manifold shown in the main text\.![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/fig7_interpolation_pair1_aoa14to4.png)Figure 7:Latent interpolation on HiLiftAeroML between different operating conditions and geometry states\. AeroJEPA yields decoded fields that remain physically meaningful across the interpolation path, with integrated coefficients that stay close to the corresponding ground\-truth values\.![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/wing_cp_section.png)Figure 8:Pressure\-coefficient profile extracted from the decoded field on a wing section for case LHC019 at12∘12^\{\\circ\}angle of attack\. The comparison shows that AeroJEPA preserves the local pressure distribution needed to recover section\-level aerodynamic behavior from the decoded solution\.![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/highlift_coeff_parity_predictor.png)Figure 9:Parity plots forCLC\_\{L\}andCDC\_\{D\}computed from the decoded predicted fields\. Unlike the latent probing experiment, these coefficients are obtained by estimating aerodynamic forces directly from the reconstructed surface solution, showing that AeroJEPA yields accurate integrated force predictions from the decoded field itself\.![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/fig9_context_design_regression.png)Figure 10:Recovery of HiLift design variables from the context latents using ridge regression\. Although the model is trained only from input mesh points and target flow fields, the context representation remains strongly aligned with the underlying geometric parameters\.![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/latent_arithmetic_main_deflections_4x4.png)Figure 11:Latent\-arithmetic analysis on HiLiftAeroML\. Each trajectory traverses the context latent space along a concept vector defined by the ridge\-regression weights of a target design variable\. The targeted variable drives most of the variation, while the remaining variables stay comparatively stable; the main residual couplings occur between inboard and outboard flaps and between inboard and outboard slats\.

### F\.2Additional SuperWing surrogate results

#### Decoded fields\.

Figure[12](https://arxiv.org/html/2605.05586#A6.F12)shows representative decoded\-field comparisons forCf,zC\_\{f,z\},Cf,τC\_\{f,\\tau\}, andCpC\_\{p\}on a SuperWing test case\. The predicted surface fields remain consistent with the reference solution across the three quantities most relevant to transonic wing analysis\.

#### Integrated aerodynamic response from decoded fields\.

Figure[13](https://arxiv.org/html/2605.05586#A6.F13)reports parity plots for aerodynamic forces estimated directly from the decoded SuperWing fields\. This comparison is performed onCLC\_\{L\}andCDC\_\{D\}recovered from the reconstructed surface solution itself\. The decoded fields therefore preserve not only local surface quantities, but also the integrated aerodynamic response needed for downstream evaluation\.

![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/Cf_z_predictor_triptych.png)

![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/Cf_tau_predictor_triptych.png)

![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/Cp_predictor_triptych.png)

Figure 12:Representative decoded\-field comparisons on SuperWing forCf,zC\_\{f,z\},Cf,τC\_\{f,\\tau\}, andCpC\_\{p\}\. The predicted surface fields remain consistent with the reference solution across the three quantities most relevant to transonic wing analysis\.![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/superwing_coeff_parity_predictor.png)Figure 13:Parity plots for aerodynamic forces estimated from the decoded SuperWing fields\. The comparison is performed onCLC\_\{L\}andCDC\_\{D\}recovered from the reconstructed surface solution, showing that the decoded fields preserve the integrated aerodynamic response with good fidelity\.

### F\.3Latent\-space aerodynamic optimization

A practical test of whether the JEPA latent space is genuinely semantic is whether one can*compute*on it: search for a design point that optimizes a physical objective, with no CAD edits and no solver in the inner loop\. We demonstrate this on the SuperWing dataset by maximizing aerodynamic efficiencyCL/CDC\_\{L\}/C\_\{D\}at a fixed cruise condition\(αcruise,Mcruise\)\(\\alpha\_\{\\mathrm\{cruise\}\},M\_\{\\mathrm\{cruise\}\}\), treating the128128\-dimensional \(mean\) context latentzctxz\_\{\\mathrm\{ctx\}\}as the optimization variable\.

#### Differentiable surrogate\.

The frozen AeroJEPA predictor maps a context latent and a flow condition to a fluid latent,zpred=Φpred​\(zctx,α,M\)z\_\{\\mathrm\{pred\}\}=\\Phi\_\{\\mathrm\{pred\}\}\(z\_\{\\mathrm\{ctx\}\},\\alpha,M\), and two linear ridge probes \(Sec\.[F\.5](https://arxiv.org/html/2605.05586#A6.SS5)\) read outCLC\_\{L\}andCDC\_\{D\}fromzpredz\_\{\\mathrm\{pred\}\}\. Composing these gives a single end\-to\-end differentiable mapzctx↦\(CL,CD\)z\_\{\\mathrm\{ctx\}\}\\mapsto\(C\_\{L\},C\_\{D\}\), so the gradient of the objectiveJ​\(zctx\)=−CL/CDJ\(z\_\{\\mathrm\{ctx\}\}\)=\-C\_\{L\}/C\_\{D\}is available by back\-propagation through the predictor\.

#### Constrained optimization\.

We solve

minzctx∈ℝ128−CL​\(zctx\)CD​\(zctx\)\\min\_\{z\_\{\\mathrm\{ctx\}\}\\in\\mathbb\{R\}^\{128\}\}\\;\-\\,\\frac\{C\_\{L\}\(z\_\{\\mathrm\{ctx\}\}\)\}\{C\_\{D\}\(z\_\{\\mathrm\{ctx\}\}\)\}\(9\)with SLSQP, subject to four families of physically motivated guardrails, all derived from training\-set statistics:

1. 1\.A Mahalanobis trust region\(zctx−μtrain\)⊤​Σtrain−1​\(zctx−μtrain\)≤τ\(z\_\{\\mathrm\{ctx\}\}\-\\mu\_\{\\mathrm\{train\}\}\)^\{\\top\}\\Sigma\_\{\\mathrm\{train\}\}^\{\-1\}\(z\_\{\\mathrm\{ctx\}\}\-\\mu\_\{\\mathrm\{train\}\}\)\\leq\\tauthat keeps the search on the latent manifold\.
2. 2\.Bounds on the design parameters that are reliably linearly decodable fromzctxz\_\{\\mathrm\{ctx\}\}\(5\-fold CVR2≥0\.85R^\{2\}\\geq 0\.85, retaining99of the5454SuperWing parameters\):xkmin≤Wk⊤​z~ctx\+bk≤xkmaxx\_\{k\}^\{\\mathrm\{min\}\}\\leq W\_\{k\}^\{\\top\}\\tilde\{z\}\_\{\\mathrm\{ctx\}\}\+b\_\{k\}\\leq x\_\{k\}^\{\\mathrm\{max\}\}\.
3. 3\.A drag floorCD≥0\.9⋅mintrain⁡CDC\_\{D\}\\geq 0\.9\\cdot\\min\_\{\\mathrm\{train\}\}C\_\{D\}and lift / drag ceilings at1\.051\.05times the corresponding training\-set extrema\.
4. 4\.AnL/DL/Dceiling pinned to the dataset’s empirical maximum, preventing extrapolation past the surrogate’s calibrated envelope\.

We use eight random restarts, drawing initialzctxz\_\{\\mathrm\{ctx\}\}values from the training distribution and projecting them inside the Mahalanobis ball\. Constraint Jacobians for the affine design and Mahalanobis terms are supplied in closed form; gradients of the aerodynamic terms are obtained by autograd\. Once SLSQP converges tozctx⋆z^\{\\star\}\_\{\\mathrm\{ctx\}\}, we read off the corresponding 54\-D design vectorx⋆=Wdesign​z~ctx⋆\+bdesignx^\{\\star\}=W\_\{\\mathrm\{design\}\}\\tilde\{z\}^\{\\star\}\_\{\\mathrm\{ctx\}\}\+b\_\{\\mathrm\{design\}\}and retrieve the dataset geometry whose standardized design vector is closest tox⋆x^\{\\star\}in Euclidean distance\.

#### Results\.

All eight restarts converge to the same neighbourhood and satisfy every constraint simultaneously\. Figure[14](https://arxiv.org/html/2605.05586#A6.F14)visualizes the optimization in a PCA projection of the training context latents: SLSQP iterates climb theCL/CDC\_\{L\}/C\_\{D\}field, remain inside the trust region, and converge next to a real training case\. The latent optimumzctx⋆z^\{\\star\}\_\{\\mathrm\{ctx\}\}is essentially overlaid with its retrieved nearest\-neighbour, evidencing that the optimization terminates on the data manifold rather than in latent fantasy regions\.

The left panel of Fig\.[4](https://arxiv.org/html/2605.05586#S3.F4)confirms physical plausibility\. In the\(CD,CL\)\(C\_\{D\},C\_\{L\}\)scatter of the full dataset, the surrogate optimum sits at the upper\-left corner of the achievable envelope, on the high\-efficiency frontier traced by the iso\-L/DL/Dcontours\. The accompanying histogram shows the optimum landing at the right tail of the datasetL/DL/Ddistribution, well separated from the initial geometry’s value: the gradient\-based search has moved monotonically up the efficiency axis without leaving the calibrated regime\.

The right panel of Fig\.[4](https://arxiv.org/html/2605.05586#S3.F4)closes the loop by exposing the design recipe behind the optimum\. Plotted as a parallel\-coordinates trace over the nine reliably encoded design parameters,x⋆x^\{\\star\}describes a recognisable high\-efficiency wing — large reference area and aspect ratio, high sweep, aggressive taper, and root\-biased washout twist\. The retrieved nearest\-neighbour wing follows the same polyline up to small offsets, showing that the recipe corresponds to a wing the dataset already contains\.

The experiment illustrates three properties of the AeroJEPA representation that are difficult to verify by inspection alone\. First, the latent space is*smooth and gradient\-friendly*: a single SLSQP solver, no hyperparameter tuning per restart, finds the same optimum from independent initializations\. Second, the linear\-probe coupling betweenzctxz\_\{\\mathrm\{ctx\}\}, the design parameters, and the aerodynamic targets is*strong enough to support inverse design*– the optimization, the bound enforcement, and the geometry retrieval all operate on the same linear readouts that the analysis in Sec\.[F\.5](https://arxiv.org/html/2605.05586#A6.SS5)characterized\. Third, and most consequentially, the optimum is*interpretable*: it is not a high\-dimensional black\-box vector but a 9\-parameter design recipe that aligns with classical low\-drag wing design principles and with a real wing from the dataset\. Together, these observations make the case that self\-supervised geometric representations can serve not only as predictive features but as a*search space*for physically\-meaningful design optimization, with the cost of a few seconds of gradient descent in place of an outer loop over CAD and CFD\.

![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/fig1_latent_trajectory.png)Figure 14:Latent\-space optimization trajectory\.Two\-dimensional PCA projection of the training context latentszctxz\_\{\\mathrm\{ctx\}\}, coloured by the corresponding datasetCL/CDC\_\{L\}/C\_\{D\}\. The dashed contour is the projection of the95%95\\%Mahalanobis trust region used to constrain the search\. Light grey curves are the SLSQP iterates of the eight random restarts; the orange curve highlights the restart selected as the global optimum\. White circles mark the SLSQP starting points\. The red star is the latent optimumzctx⋆z^\{\\star\}\_\{\\mathrm\{ctx\}\}; the blue open circle is the dataset geometry retrieved as its nearest neighbour in design space, connected tozctx⋆z^\{\\star\}\_\{\\mathrm\{ctx\}\}by the dotted segment\. The green triangle marks the initial geometry of the converging restart – the real training case closest to its starting point\. All restarts ascend theCL/CDC\_\{L\}/C\_\{D\}field and converge inside the trust region, terminating on the data manifold\.

### F\.4Concept\-vector latent arithmetic on the HighLift dataset

A semantic latent space should not only support gradient\-based optimization \(App\.[F\.3](https://arxiv.org/html/2605.05586#A6.SS3)\) but also expose interpretable*directions*along which design intent can be expressed by simple vector arithmetic\. We test this on the HighLift dataset by asking whether the AeroJEPA’s128128\-dimensional \(mean\) context latentzctxz\_\{\\mathrm\{ctx\}\}encodes high\-lift control surfaces – inboard / outboard flap and slat deflections – along separable linear axes, despite never having been trained on those labels\.

#### Concept directions from linear probes\.

We fit a single ridge probe per design parameterxkx\_\{k\}on the train split,xk=wk⊤​z~ctx\+bkx\_\{k\}=w\_\{k\}^\{\\top\}\\tilde\{z\}\_\{\\mathrm\{ctx\}\}\+b\_\{k\}withz~ctx=\(zctx−μctx\)/σctx\\tilde\{z\}\_\{\\mathrm\{ctx\}\}=\(z\_\{\\mathrm\{ctx\}\}\-\\mu\_\{\\mathrm\{ctx\}\}\)/\\sigma\_\{\\mathrm\{ctx\}\}, where the design parameters are the four main HighLift deflections \(IB Flap, OB Flap, IB Slat, OB Slat\) plus the two flap\-gap multipliers\. The unit\-norm probe directionvk=wk/∥wk∥v\_\{k\}=w\_\{k\}/\\lVert w\_\{k\}\\rVertis interpreted as the latent “concept vector” for parameterxkx\_\{k\}\. To probe the disentanglement of the representation, we walk the train\-mean latent along one direction at a time,

zctx​\(γ\)=μctx\+γ​vk,z\_\{\\mathrm\{ctx\}\}\(\\gamma\)\\;=\\;\\mu\_\{\\mathrm\{ctx\}\}\+\\gamma\\,v\_\{k\},\(10\)and at each stepγ\\gammapredict*all*design parameters from the shifted latent through the same probe matrix\. The slope of the predicted parameterxjx\_\{j\}along the walk on directionvkv\_\{k\}has a closed form,

d​x~jd​γ=\(Wctx​\(vk/σctx\)\)jσxj,\\frac\{\\mathrm\{d\}\\tilde\{x\}\_\{j\}\}\{\\mathrm\{d\}\\gamma\}=\\frac\{\\bigl\(W\_\{\\mathrm\{ctx\}\}\\,\(v\_\{k\}/\\sigma\_\{\\mathrm\{ctx\}\}\)\\bigr\)\_\{j\}\}\{\\sigma\_\{x\_\{j\}\}\},\(11\)expressed in train\-set standard deviations ofxjx\_\{j\}per unit step in latent space; we report this number as the panel\-level “σ/γ\\sigma/\\gamma” sensitivity\.

#### The disentanglement matrix\.

Figure[15](https://arxiv.org/html/2605.05586#A6.F15)shows the resulting4×44\\times 4matrix for the four main deflections\. Rows index the latent direction walked; columns index the design parameter read out\. The diagonal panels \(highlighted in red\) measure the*intended*response of each concept direction: walking the IB\-Flap concept vector should change predicted IB\-Flap deflection\. The off\-diagonal panels measure*cross\-talk*: how much an unrelated parameter moves as a side\-effect\.

The structure that emerges in Fig\.[15](https://arxiv.org/html/2605.05586#A6.F15)is striking\. Every diagonal panel exhibits a near\-unit slope \(around\+1​σ/γ\+1\\,\\sigma/\\gamma, the maximum aℓ2\\ell\_\{2\}\-regularized linear readout can reach without amplification\), confirming that each of the four main deflections has its own clearly\-aligned latent axis\. The off\-diagonal panels are largely flat – in particular, the entire flap–slat block is close to zero, meaning that the latent space has organised*flap control*and*slat control*into orthogonal subspaces\. The remaining non\-trivial structure is concentrated in two physically intuitive blocks: the IB\-Flap and OB\-Flap directions partially co\-activate each other, and similarly for IB\-Slat and OB\-Slat\. In other words, the model has learnt that inboard and outboard segments of the same control surface tend to be deflected together, while flap and slat actuation are independent – the correlation pattern dictated by real high\-lift practice\.

The probes used to define every concept direction are linear and trained*only*on the train split, with no design\-parameter supervision ever seen by the AeroJEPA itself\. Yet in the resulting representation, design intent maps onto linear directions, and the residual entanglement is not arbitrary noise but the physically meaningful coupling that engineers already build into multi\-element high\-lift systems\. Three points follow:

1. 1\.The geometry\-only context encoder*discovers*the underlying parametric basis of the design space without being told what it is\. This is the practical signal that the latent space is genuinely semantic rather than merely high\-capacity\.
2. 2\.Concept\-vector arithmetic provides a cheap interface for exploring configurations: a designer can push along the OB\-Slat direction and read the predicted flap and slat values, getting an immediate, calibrated preview of what changes co\-occur in real wings\.
3. 3\.The*block\-diagonal disentanglement structure*between flap and slat axes is a constructive ingredient for the optimization in App\.[F\.3](https://arxiv.org/html/2605.05586#A6.SS3): when a constrained search needs to move design parameters independently, the latent space provides near\-orthogonal directions to do so along the parameters that are geometrically separable, while preserving the natural coupling between IB and OB segments where the dataset itself shows it\.

Together with the gradient\-based optimization of App\.[F\.3](https://arxiv.org/html/2605.05586#A6.SS3), this experiment characterizes the JEPA latent both as a*search space*\(smooth, gradient\-friendly, linearly decodable\) and as an*interpretation space*\(separable concept directions reflecting physical actuation modes\)\.

![Refer to caption](https://arxiv.org/html/2605.05586v1/figures/latent_arithmetic_main_deflections_4x4.png)Figure 15:Concept\-vector arithmetic in the HighLift context latent\.4×44\\times 4response matrix obtained by walking the train\-mean latentμctx\\mu\_\{\\mathrm\{ctx\}\}along the unit\-norm linear\-probe direction of one design parameter at a time\.*Rows:*latent direction walked \(IB Flap, OB Flap, IB Slat, OB Slat\)\.*Columns:*design parameter read out by the corresponding linear probe\. Each panel reports the sensitivity in train standard deviations per unit walk \(σ/γ\\sigma/\\gamma\)\. Diagonal panels \(red\) show the intended concept response and reach near\-unit slope\. Off\-diagonal panels \(grey, dashed\) quantify cross\-talk between concept directions: the flap–slat off\-diagonal block is essentially flat, indicating that flap and slat actuation occupy orthogonal latent subspaces, while inboard / outboard pairs of the same control surface partially co\-activate – the same correlation engineers impose by deflecting IB and OB segments in tandem\. The disentanglement is obtained*without any design\-parameter supervision*during JEPA pre\-training; the linear probes are fit only on the train split\.

### F\.5Linear\-probe analysis of the AeroJEPA latent space

The aerodynamic optimization in App\.[F\.3](https://arxiv.org/html/2605.05586#A6.SS3)and the concept\-vector arithmetic in App\.[F\.4](https://arxiv.org/html/2605.05586#A6.SS4)both rely on a single, simple primitive: a*linear*readout from the self\-supervised AeroJEPA latents\. This section defines that readout, lists the three latent quantities it is applied to, and reports the goodness\-of\-fit numbers that justify treating those readouts as semantic\.

#### Probe definition\.

For a latent vectorz∈ℝDz\\in\\mathbb\{R\}^\{D\}and a scalar targetyy, we define a*linear probe*as the composition of train\-set standardisation and ridge regression with cross\-validated regularisation,

y^​\(z\)=w⊤​z~\+b,z~=z−μtrainσtrain,\\hat\{y\}\(z\)\\;=\\;w^\{\\top\}\\tilde\{z\}\+b,\\qquad\\tilde\{z\}\\;=\\;\\frac\{z\-\\mu\_\{\\mathrm\{train\}\}\}\{\\sigma\_\{\\mathrm\{train\}\}\},\(12\)where\(w,b\)\(w,b\)are obtained by minimising∥Z~​w\+b​1−y∥22\+λ​∥w∥22\\lVert\\tilde\{Z\}w\+b\\,\\mathbf\{1\}\-y\\rVert\_\{2\}^\{2\}\+\\lambda\\,\\lVert w\\rVert\_\{2\}^\{2\}on the train split, withλ\\lambdaselected by an inner cross\-validation overλ∈\{10−4,10−3,…,104\}\\lambda\\in\\\{10^\{\-4\},10^\{\-3\},\\dots,10^\{4\}\\\}\.

#### Three latent quantities, three probe families\.

We extract three mean\-pooled latent vectors per case from the trained AeroJEPA:

- •*Context latent*zctx∈ℝ128z\_\{\\mathrm\{ctx\}\}\\in\\mathbb\{R\}^\{128\}– the token\-mean of the context encoder applied to a geometry alone, with no flow conditions\.
- •*Predicted latent*zpred∈ℝ128z\_\{\\mathrm\{pred\}\}\\in\\mathbb\{R\}^\{128\}– the token\-mean of the predictor output for a geometry conditioned on flow\(α,M\)\(\\alpha,M\)\.
- •*Target latent*ztgt∈ℝ128z\_\{\\mathrm\{tgt\}\}\\in\\mathbb\{R\}^\{128\}– the token\-mean of the target encoder applied to the flow field; used only as a reference, not in the experiments of this paper\.

Three probe families result:

1. 1\.Context→\\todesign parameters\.For each design parameterxkx\_\{k\}\(e\.g\. each of the5454SuperWing morphological coefficients or the88HighLift control\-surface deflections\), a separate probe withz=zctxz=z\_\{\\mathrm\{ctx\}\}andy=xky=x\_\{k\}\. The full set ofKKprobes assembles into a single matrix readoutx≈Wctx​z~ctx\+bctxx\\approx W\_\{\\mathrm\{ctx\}\}\\,\\tilde\{z\}\_\{\\mathrm\{ctx\}\}\+b\_\{\\mathrm\{ctx\}\}because every probe shares the same standardisation\(μctx,σctx\)\(\\mu\_\{\\mathrm\{ctx\}\},\\sigma\_\{\\mathrm\{ctx\}\}\)\.
2. 2\.Predicted→\\toaerodynamic coefficients\.Two probes,z=zpredz=z\_\{\\mathrm\{pred\}\},y∈\{CL,CD\}y\\in\\\{C\_\{L\},C\_\{D\}\\\}\.
3. 3\.Predicted→\\toAoA \(control\)\.A single probe withz=zpredz=z\_\{\\mathrm\{pred\}\}andy=αy=\\alpha, used to verify that the predictor output – but not the geometry\-only context – carries flow\-condition information\.

#### What the probes report on each dataset\.

On the SuperWing latent NPZ \(Ntrain=23,054N\_\{\\mathrm\{train\}\}=23\{,\}054,Nval=2,931N\_\{\\mathrm\{val\}\}=2\{,\}931over3,391\+4243\{,\}391\+424disjoint geometries\), the out\-of\-foldR2R^\{2\}values are:CLC\_\{L\}fromzpredz\_\{\\mathrm\{pred\}\}:0\.9840\.984;CDC\_\{D\}fromzpredz\_\{\\mathrm\{pred\}\}:0\.9650\.965;α\\alphafromzctxz\_\{\\mathrm\{ctx\}\}:0\.0180\.018;α\\alphafromzpredz\_\{\\mathrm\{pred\}\}:0\.9870\.987\. That is, the predictor latent linearly carries both aerodynamic coefficients essentially perfectly, and the context latent is correctly*blind*to the flow condition by construction\. The5454context\-to\-design probes split into nine high\-quality probes \(CVR2≥0\.85R^\{2\}\\geq 0\.85, including reference area, dihedral, aspect ratio, taper ratio and three of the four wing\-twist coefficients\) and a long tail of weaker probes that correspond either to high\-frequency surface modes or to flow\-state encodings the geometry\-only latent cannot see \(R2≈0\.01R^\{2\}\\approx 0\.01for the eight angle\-of\-attack indicators\)\. On the HighLift dataset, the six control\-surface design probes \(IB / OB Flap and Slat deflections plus two flap gap multipliers\) all exceedR2≈0\.9R^\{2\}\\approx 0\.9, which is what makes the concept\-vector arithmetic of App\.[F\.4](https://arxiv.org/html/2605.05586#A6.SS4)possible\.

#### From probes to a downstream tool\.

Equation \([12](https://arxiv.org/html/2605.05586#A6.E12)\) has two properties that the rest of the paper exploits\. First, it is*closed\-form differentiable*inzz– the gradient∂y^/∂z=w/σtrain\\partial\\hat\{y\}/\\partial z=w/\\sigma\_\{\\mathrm\{train\}\}is just a rescaled probe vector\. This is what allows the SLSQP optimization of App\.[F\.3](https://arxiv.org/html/2605.05586#A6.SS3)to back\-propagate through the predictor and the probes in a single autograd graph, with the design\-bound constraints expressed as affine inequalities with closed\-form Jacobians\. Second, the unit\-norm probe directionvk=wk/∥wk∥v\_\{k\}=w\_\{k\}/\\lVert w\_\{k\}\\rVertis interpreted as the latent*concept vector*for parameterxkx\_\{k\}; walkingzctxz\_\{\\mathrm\{ctx\}\}alongvkv\_\{k\}and reading out every other parameter gives the disentanglement matrix of App\.[F\.4](https://arxiv.org/html/2605.05586#A6.SS4)\.

The probes are the bridge between the self\-supervised representation and the design\-space quantities a domain expert cares about\. They turn the AeroJEPA latents into a*usable*object – one that can be searched \(App\.[F\.3](https://arxiv.org/html/2605.05586#A6.SS3)\) and queried by analogy \(App\.[F\.4](https://arxiv.org/html/2605.05586#A6.SS4)\) – without ever fine\-tuning the underlying network\.

Similar Articles

Mini-JEPA Foundation Model Fleet Enables Agentic Hydrologic Intelligence

arXiv cs.LG

This paper introduces a fleet of five sensor-specialized Mini-JEPA foundation models for hydrologic intelligence, achieving high reconstruction accuracy (R² up to 0.97) and outperforming the Google AlphaEarth generalist on physics-matched tasks when routed via an LLM agent.