SubsurfaceGen: Procedural Generation of Field-Scale Earth Models and Seismic Data

arXiv cs.LG Papers

Summary

SubsurfaceGen is a GPU-accelerated generator for 3D velocity models and seismic data, releasing a dataset of 4,276 2D velocity slices and associated wavefields and shot gathers across six geological settings, aimed at advancing machine learning for full waveform inversion.

arXiv:2605.30541v1 Announce Type: new Abstract: Full waveform inversion (FWI) is the gold standard for subsurface imaging, with applications from carbon sequestration to energy and mineral exploration to earthquake hazard assessment. Machine learning approaches to FWI need field-scale, geologically diverse, and physically realistic training data, but existing resources such as Marmousi, SEAM, and OpenFWI fall short on spatial extent, temporal extent, geological diversity, and physical realism. We address these limitations with SubsurfaceGen, a GPU-accelerated generator for 3D velocity models and seismic data. Along with SubsurfaceGen, we release a paired dataset of 4,276 2D velocity slices, 5 s wavefields, and 8 s shot gathers drawn from 42 realistic, field-scale 3D velocity models, each spanning 10 km x 10 km laterally and 6.19 km deep at 10 m resolution. The dataset spans six geological settings -- four built with SubsurfaceGen and two drawn from prior sources -- relevant for carbon sequestration and hydrocarbon exploration. We use this dataset to evaluate neural operators on wavefield prediction and encoder-decoders on end-to-end velocity inversion, holding out one geological setting for out-of-distribution testing. These experiments surface failure modes at field-scale and demonstrate how SubsurfaceGen and the associated dataset can impact ML-based FWI.
Original Article
View Cached Full Text

Cached at: 06/01/26, 09:26 AM

# SubsurfaceGen: Procedural Generation of Field-Scale Earth Models and Seismic Data
Source: [https://arxiv.org/html/2605.30541](https://arxiv.org/html/2605.30541)
Joseph Stitt, Pratik Rathore, Madeleine Udell, Ching\-Yao Lai Stanford University jdstitt@sep\.stanford\.edu,\{pratikr,udell,cyaolai\}@stanford\.edu

###### Abstract

Full waveform inversion \(FWI\) is the gold standard for subsurface imaging, with applications from carbon sequestration to energy and mineral exploration to earthquake hazard assessment\. Machine learning approaches to FWI need field\-scale, geologically diverse, and physically realistic training data, but existing resources such as Marmousi, SEAM, and OpenFWI fall short on spatial extent, temporal extent, geological diversity, and physical realism\. We address these limitations with SubsurfaceGen, a GPU\-accelerated generator for 3D velocity models and seismic data\. Along with SubsurfaceGen, we release a paired dataset of 4,276 2D velocity slices, 5 s wavefields, and 8 s shot gathers drawn from 42 realistic, field\-scale 3D velocity models, each spanning10​km×10​km10\\,\\text\{km\}\\times 10\\,\\text\{km\}laterally and6\.19​km6\.19\\,\\text\{km\}deep at10​m10\\,\\text\{m\}resolution\. The dataset spans six geological settings—four built with SubsurfaceGen and two drawn from prior sources—relevant for carbon sequestration and hydrocarbon exploration\. We use this dataset to evaluate neural operators on wavefield prediction and encoder–decoders on end\-to\-end velocity inversion, holding out one geological setting for out\-of\-distribution testing\. These experiments surface failure modes at field\-scale and demonstrate how SubsurfaceGen and the associated dataset can impact ML\-based FWI\.

## 1Introduction

Subsurface imaging—the task of reconstructing the Earth’s interior from indirect surface measurements—carries significant economic and societal consequences, from verifying the integrity of carbon sequestration sites critical for addressing climate change, to discovering energy and mineral resources that underpin modern infrastructure, to assessing earthquake hazards that shape emergency planning in densely populated regions\. The gold standard for subsurface imaging is full waveform inversion \(FWI\)\(Lailly,[1983](https://arxiv.org/html/2605.30541#bib.bib85); Tarantola,[1984](https://arxiv.org/html/2605.30541#bib.bib86); Virieux and Operto,[2009](https://arxiv.org/html/2605.30541#bib.bib90)\), which reconstructs avelocity model, a spatial map of acoustic wave speed in the subsurface, from data collected in aseismic survey, in which an acoustic source is fired at many positions across a region of interest and an array of receivers records the reflections from the subsurface\. Each source firing produces ashot gather: the time series recorded at the receivers in response to that firing\. FWI matches these recordings towavefieldssimulated by solving the wave equation through a candidate velocity model\. However, FWI is notoriously difficult: it requires the solution of a high\-dimensional, non\-convex, PDE\-constrained optimization problem, which suffers from cycle skipping \(i\.e\., bad local minima\)\(Virieux and Operto,[2009](https://arxiv.org/html/2605.30541#bib.bib90); Yaoet al\.,[2019](https://arxiv.org/html/2605.30541#bib.bib95)\), sensitivity to the initial model\(Virieux and Operto,[2009](https://arxiv.org/html/2605.30541#bib.bib90)\), and computational costs that can scale to millions of core hours for realistic 3D surveys\(Schiemenz and Igel,[2013](https://arxiv.org/html/2605.30541#bib.bib92)\)\. Decades of research have produced sophisticated regularization schemes\(Symes,[2008](https://arxiv.org/html/2605.30541#bib.bib89); Biondi and Almomin,[2014](https://arxiv.org/html/2605.30541#bib.bib93); Barnieret al\.,[2023](https://arxiv.org/html/2605.30541#bib.bib96)\)and multiscale strategies\(Bunkset al\.,[1995](https://arxiv.org/html/2605.30541#bib.bib87); Fichtner,[2011](https://arxiv.org/html/2605.30541#bib.bib91)\), but the difficulties of FWI remain far from solved\.

Machine learning is promising on several of these fronts\. Neural operators can accelerate PDE solves in FWI, reducing overall computational costs, while maintaining accuracy and reliability\(Yanget al\.,[2021](https://arxiv.org/html/2605.30541#bib.bib64),[2023](https://arxiv.org/html/2605.30541#bib.bib66); Zhanget al\.,[2023](https://arxiv.org/html/2605.30541#bib.bib65); Huang and Alkhalifah,[2025](https://arxiv.org/html/2605.30541#bib.bib67)\)\. CNN\-based encoder–decoders can directly map seismic measurements to velocity models, sidestepping the cycle skipping, initialization sensitivity, and computational costs of traditional FWI methods\(Araya\-Poloet al\.,[2018](https://arxiv.org/html/2605.30541#bib.bib69); Yang and Ma,[2019](https://arxiv.org/html/2605.30541#bib.bib70); Wu and Lin,[2020](https://arxiv.org/html/2605.30541#bib.bib58); Zhang and Lin,[2020](https://arxiv.org/html/2605.30541#bib.bib71); Farriset al\.,[2023](https://arxiv.org/html/2605.30541#bib.bib37); Wanget al\.,[2023b](https://arxiv.org/html/2605.30541#bib.bib72)\)\. Generative models can learn data\-driven regularizers from existing velocity models, capturing realistic geological features that classical regularizers like Tikhonov and total variation are unable to represent\(Mosseret al\.,[2020](https://arxiv.org/html/2605.30541#bib.bib74); Stittet al\.,[2023](https://arxiv.org/html/2605.30541#bib.bib76); Wanget al\.,[2023a](https://arxiv.org/html/2605.30541#bib.bib75); Stittet al\.,[2025](https://arxiv.org/html/2605.30541#bib.bib77)\)\.

However, progress on these fronts is bottlenecked by data\. The velocity models used to train and evaluate ML\-based FWI should resemble the surveys they will be deployed on—field\-scale, geologically diverse, and physically realistic—and should be extendable\. By field\-scale, we mean velocity models that match the geometries of real surveys: tens of kilometers laterally, several kilometers deep, with recordings over several seconds\. By extendable, we mean the ability to generate new velocity models on demand\. Field\-scale data is necessary because cycle skipping, illumination gaps, and low\-frequency recovery worsen with spatial extent, and because imaging depth scales with recording time: carbon storage and hydrocarbon reservoirs sit several kilometers below the surface, requiring recordings of 5 s or more to image\. Geological diversity is needed to study generalization to unseen geologies\. Physical realism guards against ML methods producing nonsensical geological features\. Extendability supports studying scaling behavior, distribution shift, and generalization across new geological settings\.

Popular datasets for FWI, such as Marmousi\(Versteeg,[1994](https://arxiv.org/html/2605.30541#bib.bib7); Martinet al\.,[2006](https://arxiv.org/html/2605.30541#bib.bib4)\), SEAM\(Fehler and Keliher,[2011](https://arxiv.org/html/2605.30541#bib.bib2)\), and OpenFWI\(Denget al\.,[2022](https://arxiv.org/html/2605.30541#bib.bib1)\), fall short on at least one of these criteria, whereas SubsurfaceGen, the data generator we introduce in this paper, satisfies all four \([Table˜1](https://arxiv.org/html/2605.30541#S1.T1)\)\.

Table 1:SubsurfaceGen vs\. existing datasets for FWI; see also[Section˜A\.1](https://arxiv.org/html/2605.30541#A1.SS1)\.SubsurfaceGen is a state\-of\-the\-art data generator that enables procedural generation of field\-scale, geologically diverse, physically realistic 3D velocity models and seismic data\. Our contributions \(illustrated in[Fig\.˜1](https://arxiv.org/html/2605.30541#S1.F1)\) are as follows:

- •Realistic velocity model generation \([Section˜3](https://arxiv.org/html/2605.30541#S3)\)\.SubsurfaceGen allows users to deposit geological layers with interbeds \(alternating layers of contrasting velocity\), and add folding, faults, salt bodies, clinoforms, and carbonate platforms, together with structure\-oriented smoothing\(Hale,[2009](https://arxiv.org/html/2605.30541#bib.bib54)\)to suppress numerical artifacts\. To our knowledge, no other open\-source software supports procedural generation of this range of features\.
- •End\-to\-end seismic data generation \([Section˜4](https://arxiv.org/html/2605.30541#S4)\)\.SubsurfaceGen generates shot gathers and wavefields from 2D slices of 3D velocity models using Devito\(Louboutinet al\.,[2019](https://arxiv.org/html/2605.30541#bib.bib56)\), providing paired \(velocity model, seismic data\) samples for training and evaluation\. Both velocity model and seismic data generation are GPU\-accelerated, with model generation achieving up to 26\.8×\\timesspeedup over CPU\.
- •Example 4,276 sample field\-scale dataset \([Sections˜3\.2](https://arxiv.org/html/2605.30541#S3.SS2)and[4\.2](https://arxiv.org/html/2605.30541#S4.SS2)\)\.We release a dataset on Hugging Face containing 42 field\-scale 3D velocity models \(each 10 km×\\times10 km×\\times6\.19 km\)\. From these we extract 4,276 2D slices, each paired with seismic data \(5 s wavefields, 8 s shot gathers\)\. The dataset spans six geological settings: four generated with SubsurfaceGen \(based on basins in the North Sea, Gulf of Mexico, and Nova Scotia, plus a model with a large number of faults\) and two drawn from prior sources \(a model from a legacy 3D velocity model builder and the canonical SEAM model\) to broaden geological coverage\.
- •Field\-scale experiments for ML\-based FWI \([Sections˜5](https://arxiv.org/html/2605.30541#S5)and[6](https://arxiv.org/html/2605.30541#S6)\)\.We demonstrate how SubsurfaceGen and the field\-scale dataset impact ML\-based FWI: for wavefield prediction, the field\-scale grid forces predictions to be chunked, which motivates an adaptation of optimal checkpointing\(Symes,[2007](https://arxiv.org/html/2605.30541#bib.bib88)\); for end\-to\-end inversion, geological diversity supports cross\-geology generalization studies, opening a new possibility for evaluating architectures\.

The rest of the paper proceeds as follows:[Section˜2](https://arxiv.org/html/2605.30541#S2)introduces the acoustic wave equation,[Sections˜3](https://arxiv.org/html/2605.30541#S3)and[4](https://arxiv.org/html/2605.30541#S4)describe SubsurfaceGen’s velocity model and seismic data generation components,[Sections˜5](https://arxiv.org/html/2605.30541#S5)and[6](https://arxiv.org/html/2605.30541#S6)present our experiments for ML\-based FWI, and[Appendix˜A](https://arxiv.org/html/2605.30541#A1)covers related work\.

![Refer to caption](https://arxiv.org/html/2605.30541v1/x3.png)Figure 1:SubsurfaceGen is a GPU\-accelerated velocity model builder \(using PyTorch\) and seismic data generator \(using Devito\), which can be used to produce training data for ML\-based FWI\. We use SubsurfaceGen to create a dataset of 42 realistic, field\-scale 3D velocity models along with \(velocity slice, wavefield\) and \(velocity model, shot gather\) pairs\. We use these pairs to train neural operators for wavefield prediction and encoder–decoders for end\-to\-end inversion from shot gathers\.
## 2The Acoustic Wave Equation

The acoustic wave equation maps a velocity modelvp​\(𝐱\)v\_\{p\}\(\\mathbf\{x\}\)to a wavefieldp​\(𝐱,t\)p\(\\mathbf\{x\},t\)that would be recorded in a real\-world seismic survey\. This mapping is central to FWI and the ML methods in[Section˜1](https://arxiv.org/html/2605.30541#S1): FWI inverts it, neural operators approximate it, and encoder–decoders for end\-to\-end inversion are trained against data produced by it\. SubsurfaceGen solves the 2D acoustic, constant\-density, isotropic form of the wave equation with a source term:

1vp​\(𝐱\)2​∂2p​\(𝐱,t\)∂t2−∇2p​\(𝐱,t\)=s​\(𝐱,t\)\.\\frac\{1\}\{v\_\{p\}\(\\mathbf\{x\}\)^\{2\}\}\\,\\frac\{\\partial^\{2\}p\(\\mathbf\{x\},t\)\}\{\\partial t^\{2\}\}\\;\-\\;\\nabla^\{2\}p\(\\mathbf\{x\},t\)\\;=\\;s\(\\mathbf\{x\},t\)\.\(1\)Here𝐱=\(x,z\)\\mathbf\{x\}=\(x,z\)is the spatial coordinate \(lateralxx, depthzz\),ttis time,p​\(𝐱,t\)p\(\\mathbf\{x\},t\)is the wavefield,vp​\(𝐱\)v\_\{p\}\(\\mathbf\{x\}\)is the velocity model,∇2=∂x2\+∂z2\\nabla^\{2\}=\\partial\_\{x\}^\{2\}\+\\partial\_\{z\}^\{2\}is the Laplacian, ands​\(𝐱,t\)s\(\\mathbf\{x\},t\)is the source term\. SubsurfaceGen models the sources​\(𝐱,t\)s\(\\mathbf\{x\},t\)as a Ricker wavelet\(Ricker,[1953](https://arxiv.org/html/2605.30541#bib.bib84)\), which is standard in seismic processing\(Wang,[2015](https://arxiv.org/html/2605.30541#bib.bib94)\)\. Initial and boundary conditions are deferred to[Appendix˜E](https://arxiv.org/html/2605.30541#A5)\.

Solving \([1](https://arxiv.org/html/2605.30541#S2.E1)\) produces the wavefieldp​\(𝐱,t\)p\(\\mathbf\{x\},t\)across space and time\. However, the wavefield is not observed at all spatial locations in a seismic survey\. Seismic surveys place a sparse line of sensors at the surface, calledreceivers, that record the time history of the wavefield\. SubsurfaceGen reproduces this with a sampling operatorℛ\\mathcal\{R\}that observes the wavefieldppat the receiver locations: the result is a shot gatherℛ​p\\mathcal\{R\}p, a 2D tensor with axes for time and receiver index\. The remaining challenge is generating diverse, realistic velocity models, which we address in[Section˜3](https://arxiv.org/html/2605.30541#S3)\.

## 3SubsurfaceGen: Procedural Velocity Model Generation

SubsurfaceGen provides functionality for generating realistic, field\-scale 3D velocity models using a catalog of modules that emulate geological features observed in real\-world seismic data\. These modules include interbeds, faults, salt bodies, clinoforms, and carbonate platforms, which are essential for characterizing sedimentary settings that arise in carbon sequestration and hydrocarbon exploration\. We have written SubsurfaceGen’s model building functionality in PyTorch, which enables GPU\-accelerated velocity model generation, making it easier for practitioners to generate large datasets of velocity models\. We also describe our process for generating velocity models that are contained in the field\-scale dataset on Hugging Face\. Additional details are available in[Appendices˜B](https://arxiv.org/html/2605.30541#A2),[C](https://arxiv.org/html/2605.30541#A3)and[D](https://arxiv.org/html/2605.30541#A4)\.

### 3\.1Velocity Model Builder

##### The module catalog\.

SubsurfaceGen introduces eight modules for building velocity models\. Three of them \(Deposit,Squish,Fault\) improve on the synthetic\-model package fromClapp \([2018](https://arxiv.org/html/2605.30541#bib.bib9),[2022](https://arxiv.org/html/2605.30541#bib.bib10),[2024](https://arxiv.org/html/2605.30541#bib.bib11)\)\. The remaining five modules \(SaltSDT,SaltWedge,CarbonatePlatform,DeltaClinoformDeposit,StructuralSmoother\) are original to SubsurfaceGen\. At a high level:Depositadds stratified layers,Squishwarps layers into folds,Faultcuts and shifts layers,SaltSDTemplaces salt bodies andSaltWedgedeforms layers around them,CarbonatePlatformbuilds reef structures,DeltaClinoformDepositplaces deltaic formations, andStructuralSmootherapplies geology\-aware smoothing\.[Table˜5](https://arxiv.org/html/2605.30541#A2.T5)provides a full description of each module\.

##### Building velocity models\.

Velocity models are built byapplyingthe modules in sequence, starting from a flat, homogeneous volume \(called thebasement\) and adding modules going up to the surface\. Each module takes the current velocity model as input and modifies it according to its specific geological feature\. The order of module application follows geological time, with older features laid down before younger ones\. After all modules have been applied, the user can apply structure\-oriented smoothing \(SOS\)\(Hale,[2009](https://arxiv.org/html/2605.30541#bib.bib54)\)viaStructuralSmootherto remove numerical artifacts from the build process while preserving important geological features like faults and bed contacts\. The model building process is highly customizable, allowing users to specify parameters for each module to generate a variety of velocity models that capture the diversity of geological settings observed in real subsurface scenarios\.[Fig\.˜2](https://arxiv.org/html/2605.30541#S3.F2)illustrates this process for a Penobscot velocity model\.

![Refer to caption](https://arxiv.org/html/2605.30541v1/x4.png)Figure 2:Top: Step\-by\-step construction of a Penobscot velocity model using SubsurfaceGen modules \(we omit some steps for simplicity\)\. Bottom: The velocity model juxtaposed against the Penobscot migrated cube\. The velocity model captures key geological features of the migrated cube\.
##### GPU generation makes SubsurfaceGen extensible\.

[Table˜2](https://arxiv.org/html/2605.30541#S3.T2)reports per\-setting build times on GPU vs\. CPU: GPU yields a 7\.2×\\times–26\.8×\\timesspeedup, bringing build times down from 1–2 hours on CPU to 5–10 minutes on GPU\. This makes SubsurfaceGen particularly useful for practitioners who want to adapt it to a new geological setting or study scaling behavior\.

Table 2:Total build time per geological setting\. GPU: one 80 GB NVIDIA A100\. CPU: 16 cores of an AMD EPYC 7763 with 256 GB of RAM\. A full breakdown of the timings is in[Appendix˜D](https://arxiv.org/html/2605.30541#A4)\.

### 3\.2Generation of Field\-Scale Velocity Models for Hugging Face Dataset

##### Building the dataset with SubsurfaceGen\.

We use SubsurfaceGen to generate the velocity models in our Hugging Face dataset, building each model by applying modules from basement to surface\. The dataset contains 42 field\-scale velocity models in total, each6\.19​km×10​km×10​km6\.19\\,\\text\{km\}\\times 10\\,\\text\{km\}\\times 10\\,\\text\{km\}at 10 m resolution, organized into six geological settings\. Four of the settings \(Penobscot, F3, Gulf of Mexico, Fault\) are SubsurfaceGen builds\. The remaining two are not: SEAM is an existing model\(Fehler and Keliher,[2011](https://arxiv.org/html/2605.30541#bib.bib2)\)and Salt Canopy was generated using a legacy model building code with inspiration fromFarriset al\.\([2023](https://arxiv.org/html/2605.30541#bib.bib37)\)\. A full inventory is given in[Table˜6](https://arxiv.org/html/2605.30541#A3.T6)of[Appendix˜C](https://arxiv.org/html/2605.30541#A3)\.

##### Diversity between and within geological settings\.

Our velocity models are designed to mimic publicly\-availablemigrated cubes—estimates of subsurface structure reconstructed from seismic reflection data recorded at the surface\(Lo and Inderwiesen,[1994](https://arxiv.org/html/2605.30541#bib.bib57)\), similar to the way in which medical CT scans estimate internal anatomy from X\-ray attenuation measurements taken outside the body\. The velocity models in our dataset capture diversity across two levels:between geological settingsandwithin each setting\. We provide six settings \(Penobscot, F3, Gulf of Mexico, Fault, Salt Canopy, SEAM\) so that downstream models see a wide variety of geological features\. Within each setting, we provide multiple realizations drawn from the same regional distribution \(we describe our methodology in the next paragraph\)\. Publicly\-available migrated cubes are scarce, so within\-setting diversity allows us to leverage the few cubes that do exist while still providing enough velocity models for training\.

##### Matching to migrated cubes and sampling variants\.

We match a velocity model for each setting to the corresponding migrated cube, then sample variants around the matched parameters\. The build process is the deterministic mappingℳ​\(θ\)=V\\mathcal\{M\}\(\\theta\)=V, whereℳ\\mathcal\{M\}denotes the velocity model builder,θ\\thetadenotes the parameter vector configuring SubsurfaceGen modules, andVVdenotes the resulting velocity model\. Matching is manual: given a public migrated cubeCGTC\_\{\\text\{GT\}\}, we pick a parameter vectorθ\\theta, generateℳ​\(θ\)\\mathcal\{M\}\(\\theta\), compare toCGTC\_\{\\text\{GT\}\}by visual inspection, and iterate untilℳ​\(θcal\)\\mathcal\{M\}\(\\theta\_\{\\text\{cal\}\}\)resemblesCGTC\_\{\\text\{GT\}\}\. We then define a sampling distribution𝒟\\mathcal\{D\}centered atθcal\\theta\_\{\\text\{cal\}\}, draw

θ1,…,θk∼iid𝒟,\\theta\_\{1\},\\ldots,\\theta\_\{k\}\\stackrel\{\{\\scriptstyle\\text\{iid\}\}\}\{\{\\sim\}\}\\mathcal\{D\},and produce the setting’s models asℳ​\(θcal\),ℳ​\(θ1\),…,ℳ​\(θk\)\\mathcal\{M\}\(\\theta\_\{\\text\{cal\}\}\),\\mathcal\{M\}\(\\theta\_\{1\}\),\\ldots,\\mathcal\{M\}\(\\theta\_\{k\}\)\.[Fig\.˜2](https://arxiv.org/html/2605.30541#S3.F2)shows the public migrated cubeCGTC\_\{\\text\{GT\}\}alongside our matched syntheticℳ​\(θcal\)\\mathcal\{M\}\(\\theta\_\{\\text\{cal\}\}\)for the Penobscot setting; per\-setting geological details and build scripts are in[Appendix˜C](https://arxiv.org/html/2605.30541#A3)\.

## 4SubsurfaceGen: Seismic Data Generation

We describe the seismic data generation functionality of SubsurfaceGen and explain how we use it to build the seismic data in our Hugging Face dataset\. For each 3D velocity model from[Section˜3](https://arxiv.org/html/2605.30541#S3), we extract 2D slices and solve the acoustic wave equation \([1](https://arxiv.org/html/2605.30541#S2.E1)\) on each slice to obtain the corresponding wavefields and shot gathers\. We repeat this for five different source bandwidths \([Table˜14](https://arxiv.org/html/2605.30541#A5.T14)\), motivated by multiscale FWI\(Bunkset al\.,[1995](https://arxiv.org/html/2605.30541#bib.bib87); Fichtner,[2011](https://arxiv.org/html/2605.30541#bib.bib91)\), where practitioners progressively increase the source bandwidth to mitigate cycle skipping\. For additional details on the seismic data generation process, see[Appendix˜E](https://arxiv.org/html/2605.30541#A5)\.

![Refer to caption](https://arxiv.org/html/2605.30541v1/x5.png)Figure 3:Wavefield evolution on a Penobscot crossline slice with a Ricker wavelet source near the surface\. Waves radiate outward from the source until they enter a salt body \(1\.96 s\), with chaotic multipath behavior dominating at late times \(5\.0 s\)\.### 4\.1Seismic Data Generator

SubsurfaceGen extracts 2D slices from a 3D velocity model along a horizontal axis \(inline or crossline\)\. SubsurfaceGen uses Devito\(Louboutinet al\.,[2019](https://arxiv.org/html/2605.30541#bib.bib56)\)to solve \([1](https://arxiv.org/html/2605.30541#S2.E1)\) for each slice at a user\-specified spatial discretization and time discretization, simulation time, source bandwidth, and source position\. The sources​\(𝐱,t\)s\(\\mathbf\{x\},t\)is a Ricker wavelet, as motivated in[Section˜2](https://arxiv.org/html/2605.30541#S2)\. The user can save the wavefieldp​\(𝐱,t\)p\(\\mathbf\{x\},t\)and/or the shot gatherℛ​p\\mathcal\{R\}p\.

### 4\.2Generation of Seismic Data for Hugging Face Dataset

##### Slice configuration\.

For all 42 velocity models from[Section˜3\.2](https://arxiv.org/html/2605.30541#S3.SS2), we randomly draw 2D slices along the inline and crossline axes, with half of the slices coming from each orientation\. Each slice is of size619×1000619\\times 1000at 10 m grid spacing; per\-setting slice counts are listed in[Table˜3](https://arxiv.org/html/2605.30541#S4.T3)\.

##### Wavefields and shot gathers\.

Our dataset targets marine acquisitions: for each slice, we solve the acoustic wave equation \([1](https://arxiv.org/html/2605.30541#S2.E1)\) with Devito on a 10 m grid, setting the sources​\(𝐱,t\)s\(\\mathbf\{x\},t\)to be a Ricker wavelet near the surface to emulate an air gun towed behind a vessel\. We generate wavefields by using a single source at a random horizontal position, simulating for 5 s, and saving the wavefieldp​\(𝐱,t\)p\(\\mathbf\{x\},t\)as a tensor of shape358×619×1000358\\times 619\\times 1000;[Fig\.˜3](https://arxiv.org/html/2605.30541#S4.F3)shows a wavefield generated from a Penobscot slice\. We generate shot gathers by using 64 equally\-spaced sources and 1000 equally\-spaced receivers, simulating for 8 s \(long enough for the deepest reflections to return to the surface\), and aggregating the individual shot gathersℛ​p\\mathcal\{R\}pas ashot\-gather cubeof shape64×572×100064\\times 572\\times 1000\. Representative gathers from three training settings appear in[Fig\.˜11](https://arxiv.org/html/2605.30541#A5.F11)\.

##### Dataset splits\.

We construct two test sets \([Table˜3](https://arxiv.org/html/2605.30541#S4.T3)\)\. The in\-distribution test set draws slices from the same five settings as training, but at non\-overlapping positions within each velocity model\. The out\-of\-distribution test set is Penobscot, which we hold out from training entirely to evaluate generalization to unseen geology\.

Table 3:Dataset split\. Each slice is paired with a wavefield and a shot\-gather cube at every frequency band, yielding\(4,096\+100\+80\)⋅5=21,380\(4\{,\}096\+100\+80\)\\cdot 5=21\{,\}380wavefields and 21,380 shot\-gather cubes\.

## 5Wavefield Prediction with Neural Operators

We train neural operators to predict wavefields—an alternative to the expensive PDE solves that dominate FWI’s compute cost\. At field scale, predicting the fullT=348T=348frame trajectory on the619×1000619\\times 1000grid in one forward pass exceeds GPU memory, forcing achunked autoregressive rollout: each chunk takes the lasttin=10t\_\{\\mathrm\{in\}\}=10frames, the velocity model, and a source mask, predicts the nexttout=50t\_\{\\mathrm\{out\}\}=50frames, and feeds its output into the next chunk\. This regime does not arise on smaller datasets like OpenFWI, where a single forward pass fits in memory\. We compare two architecturally distinct neural operators, TFNO\(Kossaifiet al\.,[2024](https://arxiv.org/html/2605.30541#bib.bib78)\)and DPOT\(Haoet al\.,[2024](https://arxiv.org/html/2605.30541#bib.bib83)\), trained for 10 epochs on a single seed \(full hyperparameters and architecture details in[Appendix˜F](https://arxiv.org/html/2605.30541#A6)\)\. One F3 test slice serves as a running example throughout this section\.

[Fig\.˜4](https://arxiv.org/html/2605.30541#S5.F4)shows TFNO and DPOT prediction error on a single chunk of the F3 example slice at late propagation time, where reflections from the salt body produce chaotic multipath interference\. Both architectures track the dominant wavefronts well, with errors concentrated in deep regions\. Per\-geology and per\-wavenumber breakdowns of TFNO and DPOT performance are in[Section˜F\.7](https://arxiv.org/html/2605.30541#A6.SS7)\.

[Fig\.˜6](https://arxiv.org/html/2605.30541#S5.F6)shows the cost of chunked autoregressive rollout over the entireT=348T=348frame trajectory: the L2 relative error \(L2RE\) compounds by roughly an order of magnitude from one chunk to the end—though the dominant reflections remain visible in both models’ predictions out to several seconds\. The field\-scale dataset enables this regime in the first place: at the shorter propagation times of smaller datasets like OpenFWI \(1 s vs\. our 5 s\), no chunking would be required, so the long\-horizon failure modes we observe here would not arise\.

![Refer to caption](https://arxiv.org/html/2605.30541v1/x6.png)Figure 4:Wavefield prediction error for TFNO \(top\) and DPOT \(bottom\) on the F3 example slice over a single chunk at late propagation\. Errors concentrate where the wavefield is most chaotic, illustrating the late\-time scattering both networks struggle to reproduce\.FWI manages a similar storage\-vs\-compute trade\-off via optimal checkpointing\(Symes,[2007](https://arxiv.org/html/2605.30541#bib.bib88)\): anchor wavefield states are stored at sparse times and the wavefield is recomputed between them by finite\-difference solves\. We adapt optimal checkpointing to neural operators by filling aτ=20\\tau=20frame interior gap between two stored anchor windows ofTL=TR=20T\_\{L\}=T\_\{R\}=20frames on each side, with the prediction pinned to the anchor frames at the gap boundaries \(full architecture in[Section˜F\.2](https://arxiv.org/html/2605.30541#A6.SS2)\); we call this variantTFNO\-interp\. Anchoring at the gap boundaries uniformly reduces drift:[Fig\.˜5](https://arxiv.org/html/2605.30541#S5.F5)shows it qualitatively on the F3 example, and[Fig\.˜6](https://arxiv.org/html/2605.30541#S5.F6)confirms TFNO\-interp’s L2RE in the gap intervals is roughly6×6\\timeslower than either forward operator’s at every time step\. These experiments show the value of the field\-scale dataset: the drift on chunked rollout and the analogy to optimal checkpointing both surface at field\-scale, motivating TFNO\-interp\.

![Refer to caption](https://arxiv.org/html/2605.30541v1/x7.png)Figure 5:TFNO\-interp prediction error on the same F3 example slice, sampled at four interior gap centers over the full trajectory\. TFNO\-interp’s error is visibly lower than TFNO’s and DPOT’s, and the gap\-centered sampling shows that the improvement is not just near the anchor frames\.![Refer to caption](https://arxiv.org/html/2605.30541v1/x8.png)Figure 6:L2RE versus propagation time for TFNO, DPOT, and TFNO\-interp on the in\-distribution and out\-of\-distribution test sets\. The forward operators’ \(TFNO, DPOT\) error compounds over time; TFNO\-interp’s prediction windows \(shaded light gray\) stay roughly an order of magnitude lower\.
## 6End\-to\-End Inversion with Encoder–Decoders

We train networks that map a 3D shot\-gather cube directly to the corresponding 2D velocity modelvp​\(𝐱\)v\_\{p\}\(\\mathbf\{x\}\)—an alternative to classical FWI that could avoid cycle skipping and the cost of repeated PDE solves\. To our knowledge, our dataset is the first to enable cross\-geology generalization studies on this task at field\-scale: Marmousi and SEAM each contain only one geological setting, and OpenFWI evaluates cross\-geology generalization, but only on 0\.7 km×\\times0\.7 km models\.

We compare three encoder–decoder architectures:InversionNet\(Wu and Lin,[2020](https://arxiv.org/html/2605.30541#bib.bib58)\)treats the shot\-gather cube as a 2D image with 64 channels;Transformer\(Liuet al\.,[2021](https://arxiv.org/html/2605.30541#bib.bib59); Hatamizadehet al\.,[2022](https://arxiv.org/html/2605.30541#bib.bib60)\)andCNN\(Isenseeet al\.,[2024](https://arxiv.org/html/2605.30541#bib.bib61)\)treat it as a 3D volume\. The Transformer and CNN share an identical decoder architecture, so any gap between them is attributable to the encoder\. We train each architecture for 100 epochs across three seeds and report root mean square error \(RMSE\) and structural similarity \(SSIM\)\(Wanget al\.,[2004](https://arxiv.org/html/2605.30541#bib.bib63)\)on the three splits \([Table˜3](https://arxiv.org/html/2605.30541#S4.T3)\), where Penobscot is held out as the out\-of\-distribution \(OOD\) test split\. Full details are available in[Appendix˜G](https://arxiv.org/html/2605.30541#A7)\.

[Table˜4](https://arxiv.org/html/2605.30541#S6.T4)reports RMSE and SSIM across the three splits and[Fig\.˜7](https://arxiv.org/html/2605.30541#S6.F7)shows prediction errors on the in\-distribution test split\. The CNN captures large\-scale geological features \(salt bodies, faults\) better than the other two architectures, but all three architectures struggle to resolve fine\-scale features\. Moreover, the CNN uniformly outperforms InversionNet, showing the value of a 3D representation in the encoder\. The Penobscot hold\-out lets us understand how each architecture generalizes to a new geological setting: the Transformer’s SSIM is stable while the CNN’s and InversionNet’s drop by 0\.04–0\.07, revealing a relationship between architecture and OOD performance\.

Table 4:End\-to\-end inversion accuracy across architectures and splits \(averaged over 3 seeds\)\.![Refer to caption](https://arxiv.org/html/2605.30541v1/x9.png)Figure 7:Inverted velocity models on the in\-distribution test split \(one slice per setting: F3, Fault, Gulf of Mexico, Salt Canopy, SEAM\)\. Top row: ground\-truth velocity\. Subsequent rows: prediction error for InversionNet, Transformer, and CNN\.
## 7Conclusion

We introduce SubsurfaceGen, a GPU\-accelerated procedural velocity model builder and seismic data generator for ML\-based FWI\. We use SubsurfaceGen to generate an example field\-scale dataset, available on Hugging Face\. Our experiments reveal how SubsurfaceGen can impact ML\-based FWI: for wavefield prediction, the field\-scale grid forces predictions to be chunked, which motivates checkpointing strategies; for end\-to\-end inversion, geological diversity supports cross\-geology generalization studies, opening a new possibility for evaluating architectures\. Natural extensions include wavefield and shot gather generation over 3D domains and modeling with the elastic wave equation\.

## Impact Statement

SubsurfaceGen and the field\-scale dataset democratize ML for seismic imaging by lowering the barrier to entry for researchers who lack proprietary seismic data\. SubsurfaceGen and the field\-scale dataset can be applied for verifying the integrity of carbon sequestration sites \(climate\-positive\) and for hydrocarbon exploration, which is a significant driver of greenhouse gas emissions \(climate\-negative\); the net environmental impact depends on who builds upon this work and how they use it\. The field\-scale dataset is intended to support ML method development and could serve as a foundation for future benchmarks, but we caution against deploying models trained on it in real\-world settings without further validation and testing\.

## Acknowledgments and Disclosure of Funding

We would like to thank Bob Clapp for creating the first version of the velocity model builder and Rustam Akhmadiev for suggesting the checkpointing idea\. We would also like to thank Elliana Abrahams, Guillaume Barnier, Ettore Biondi, Brian Chivers, Thomas Cullison, Stuart Farris, Zachary Frangella, Wenzhi Gao, Joshua Rines, Drew Stump, Yinjun Wang, and Anders Wikum for valuable feedback that helped shape this submission\. JS acknowledges support from the Stanford Geophysics Ph\.D\. program, including the secondary research project completed under the advising of CYL as part of the Ph\.D\. breadth requirement, and graduate research support under the advising of Biondo Biondi\. PR and MU gratefully acknowledge support from the Office of Naval Research under award N000142412306, Air Force Office of Scientific Research under award FA9550\-26\-1\-0012, the Alfred P\. Sloan Foundation, the Stanford Institute for Human\-Centered Artificial Intelligence, and from IBM Research as a founding member of Stanford Institute for Human\-centered Artificial Intelligence\. CYL gratefully acknowledges support from the Alfred P\. Sloan Foundation via grant FG\-2024\-21649\. We acknowledge Stanford University and the Center for Computation at the Stanford Doerr School of Sustainability for computational resources and support, including access to the Sherlock computing cluster and the serc partition\. We also acknowledge computational resources provided through allocation EES260056 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support \(ACCESS\) program, supported by the U\.S\. National Science Foundation\.

## References

- Deep\-learning tomography\.The Leading Edge37\(1\),pp\. 58–66\.Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2),[§1](https://arxiv.org/html/2605.30541#S1.p2.1)\.
- M\. E\. Badley, J\. D\. Price, C\. R\. Dahl, and T\. Agdestein \(1988\)The structural evolution of the northern Viking Graben and its bearing upon extensional modes of basin formation\.Journal of the Geological Society145\(3\),pp\. 455–472\.Cited by:[§C\.3](https://arxiv.org/html/2605.30541#A3.SS3.p1.1)\.
- G\. Barnier, E\. Biondi, R\. G\. Clapp, and B\. Biondi \(2023\)Full\-waveform inversion by model extension: Practical applications\.Geophysics88\(5\),pp\. R609–R643\.Cited by:[§1](https://arxiv.org/html/2605.30541#S1.p1.1)\.
- L\. Baroni, R\. M\. Silva, R\. S\. Ferreira, D\. Civitarese, D\. Szwarcman, and E\. V\. Brazil \(2019\)Penobscot dataset: fostering machine learning development for seismic interpretation\.arXiv preprint arXiv:1903\.12060\.Cited by:[§C\.1](https://arxiv.org/html/2605.30541#A3.SS1.p1.1)\.
- B\. Biondi and A\. Almomin \(2014\)Simultaneous inversion of full data bandwidth by tomographic full\-waveform inversion\.Geophysics79\(3\),pp\. WA129–WA140\.Cited by:[§1](https://arxiv.org/html/2605.30541#S1.p1.1)\.
- D\. E\. Brown \(2008\)Regional geology of the Scotian Basin\.Canada\-Nova Scotia Offshore Petroleum Board\.Cited by:[§C\.1](https://arxiv.org/html/2605.30541#A3.SS1.p1.1)\.
- C\. Bunks, F\. M\. Saleck, S\. Zaleski, and G\. Chavent \(1995\)Multiscale seismic waveform inversion\.Geophysics60\(5\),pp\. 1457–1473\.Cited by:[§1](https://arxiv.org/html/2605.30541#S1.p1.1),[§4](https://arxiv.org/html/2605.30541#S4.p1.1)\.
- C\. Cerjan, D\. Kosloff, R\. Kosloff, and M\. Reshef \(1985\)A nonreflecting boundary condition for discrete acoustic and elastic wave equations\.Geophysics50\(4\),pp\. 705–708\.Cited by:[§E\.2](https://arxiv.org/html/2605.30541#A5.SS2.SSS0.Px3.p1.1)\.
- R\. G\. Clapp \(2018\)Synthetic model building for training neural networks in a Jupyter notebook\.Technical reportTechnical ReportSEP\-172,Stanford Exploration Project\.Cited by:[§3\.1](https://arxiv.org/html/2605.30541#S3.SS1.SSS0.Px1.p1.1)\.
- R\. G\. Clapp \(2022\)Generating synthetic models for machine learning\.Technical reportTechnical ReportSEP\-187,Stanford Exploration Project\.Cited by:[§3\.1](https://arxiv.org/html/2605.30541#S3.SS1.SSS0.Px1.p1.1)\.
- R\. G\. Clapp \(2024\)synthetic\-model: A library for creating synthetic geologic velocity modelsStanford Exploration Project\.External Links:[Link](https://github.com/SEP-software/synthetic-model)Cited by:[§3\.1](https://arxiv.org/html/2605.30541#S3.SS1.SSS0.Px1.p1.1)\.
- O\. E\. Coker, H\. Wang, A\. Khan, and P\. K\. Jimack \(2025\)Scheduled temporal loss weighting for neural operators\.InNeurIPS Workshop on Machine Learning and the Physical Sciences,Cited by:[§F\.4](https://arxiv.org/html/2605.30541#A6.SS4.p2.8)\.
- R\. Courant, K\. Friedrichs, and H\. Lewy \(1928\)Über die partiellen differenzengleichungen der mathematischen physik\.Mathematische Annalen100\(1\),pp\. 32–74\.Cited by:[§E\.3\.2](https://arxiv.org/html/2605.30541#A5.SS3.SSS2.p1.2)\.
- P\. A\. Cowie and C\. H\. Scholz \(1992\)Displacement\-length scaling relationship for faults: data synthesis and discussion\.Journal of Structural Geology14\(10\),pp\. 1149–1156\.Cited by:[§C\.3](https://arxiv.org/html/2605.30541#A3.SS3.p1.1)\.
- I\. Davison, I\. Alsop, P\. Birch, C\. Elders, N\. Evans, H\. Nicholson, P\. Rorison, D\. Wade, J\. Woodward, and M\. Young \(2000\)Geometry and late\-stage structural evolution of central Graben salt diapirs, North Sea\.Marine and Petroleum Geology17\(4\),pp\. 499–522\.Cited by:[§C\.2](https://arxiv.org/html/2605.30541#A3.SS2.p1.1)\.
- M\. de la Varga, A\. Schaaf, and F\. Wellmann \(2019\)GemPy 1\.0: open\-source stochastic geological modeling and inversion\.Geoscientific Model Development12\(1\),pp\. 1–32\.Cited by:[§A\.3](https://arxiv.org/html/2605.30541#A1.SS3.p1.1)\.
- C\. Deng, S\. Feng, H\. Wang, X\. Zhang, P\. Jin, Y\. Feng, Q\. Zeng, Y\. Chen, and Y\. Lin \(2022\)OpenFWI: large\-scale multi\-structural benchmark datasets for full waveform inversion\.InAdvances in Neural Information Processing Systems, Datasets and Benchmarks Track,Cited by:[§A\.1](https://arxiv.org/html/2605.30541#A1.SS1.p1.1),[§G\.2](https://arxiv.org/html/2605.30541#A7.SS2.p1.1),[§1](https://arxiv.org/html/2605.30541#S1.p4.1)\.
- F\. A\. Diegel, J\. F\. Karlo, D\. C\. Schuster, R\. C\. Shoup, and P\. R\. Tauvers \(1995\)Cenozoic structural evolution and tectono\-stratigraphic framework of the northern Gulf Coast continental margin\.InSalt Tectonics: A Global Perspective,M\. P\. A\. Jackson, D\. G\. Roberts, and S\. Snelson \(Eds\.\),AAPG Memoir, Vol\.65,pp\. 109–151\.Cited by:[§C\.4](https://arxiv.org/html/2605.30541#A3.SS4.p1.1)\.
- L\. S\. Eliuk \(1978\)The Abenaki Formation, Nova Scotia Shelf, Canada — a depositional and diagenetic model for a mesozoic carbonate platform\.Bulletin of Canadian Petroleum Geology26\(4\),pp\. 424–514\.Cited by:[§C\.1](https://arxiv.org/html/2605.30541#A3.SS1.p1.1)\.
- S\. Farris, R\. Clapp, and M\. Araya\-Polo \(2023\)Learning\-based seismic velocity inversion with synthetic and field data\.Sensors23\(19\)\.Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2),[§C\.5](https://arxiv.org/html/2605.30541#A3.SS5.p2.6),[§1](https://arxiv.org/html/2605.30541#S1.p2.1),[§3\.2](https://arxiv.org/html/2605.30541#S3.SS2.SSS0.Px1.p1.1)\.
- S\. Farris \(2023\)Seismic velocity model building with deep convolutional neural networks\.Ph\.D\. thesis,Stanford University\.Note:Stanford Exploration Project report SEP\-190Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2),[§C\.5](https://arxiv.org/html/2605.30541#A3.SS5.p2.6)\.
- M\. Fehler and P\. J\. Keliher \(2011\)SEAM Phase I: Challenges of Subsalt Imaging in Tertiary Basins, with Emphasis on Deepwater Gulf of Mexico\.Society of Exploration Geophysicists\.Cited by:[§A\.1](https://arxiv.org/html/2605.30541#A1.SS1.p1.1),[§C\.6](https://arxiv.org/html/2605.30541#A3.SS6.p1.1),[§1](https://arxiv.org/html/2605.30541#S1.p4.1),[§3\.2](https://arxiv.org/html/2605.30541#S3.SS2.SSS0.Px1.p1.1)\.
- S\. Feng, H\. Wang, C\. Deng, Y\. Feng, Y\. Liu, M\. Zhu, P\. Jin, Y\. Chen, and Y\. Lin \(2023\)𝔼FWI\\mathbf\{\\mathbb\{E\}^\{\\text\{FWI\}\}\}: multiparameter benchmark datasets for elastic full waveform inversion of geophysical properties\.InAdvances in Neural Information Processing Systems, Datasets and Benchmarks Track,Cited by:[§A\.1](https://arxiv.org/html/2605.30541#A1.SS1.p1.1)\.
- A\. Fichtner \(2011\)Full seismic waveform modelling and inversion\.Advances in Geophysical and Environmental Mechanics and Mathematics,Springer\.Cited by:[§1](https://arxiv.org/html/2605.30541#S1.p1.1),[§4](https://arxiv.org/html/2605.30541#S4.p1.1)\.
- B\. Fornberg \(1988\)Generation of finite difference formulas on arbitrarily spaced grids\.Mathematics of Computation51\(184\),pp\. 699–706\.Cited by:[§E\.3\.2](https://arxiv.org/html/2605.30541#A5.SS3.SSS2.p1.7)\.
- R\. L\. Gawthorpe and M\. R\. Leeder \(2000\)Tectono\-sedimentary evolution of active extensional basins\.Basin Research12\(3–4\),pp\. 195–218\.Cited by:[§C\.3](https://arxiv.org/html/2605.30541#A3.SS3.p1.1)\.
- K\. A\. Giles and M\. G\. Rowan \(2012\)Concepts in halokinetic\-sequence deformation and stratigraphy\.Geological Society, London, Special Publications363\(1\),pp\. 7–31\.Cited by:[§B\.5](https://arxiv.org/html/2605.30541#A2.SS5.p1.1),[§C\.4](https://arxiv.org/html/2605.30541#A3.SS4.p1.1),[§C\.4](https://arxiv.org/html/2605.30541#A3.SS4.p2.2)\.
- L\. Grose, L\. Ailleres, G\. Laurent, and M\. Jessell \(2021\)LoopStructural 1\.0: time\-aware geological modelling\.Geoscientific Model Development14\(6\),pp\. 3915–3937\.Cited by:[§A\.3](https://arxiv.org/html/2605.30541#A1.SS3.p1.1)\.
- D\. Hale \(2009\)Structure\-oriented smoothing and semblance\.CWP Report635\.Cited by:[§B\.8](https://arxiv.org/html/2605.30541#A2.SS8.p1.1),[Table 5](https://arxiv.org/html/2605.30541#A2.T5.1.9.7.2.1.1),[1st item](https://arxiv.org/html/2605.30541#S1.I1.i1.p1.1),[§3\.1](https://arxiv.org/html/2605.30541#S3.SS1.SSS0.Px2.p1.1)\.
- Z\. Hao, C\. Su, S\. Liu, J\. Berner, C\. Ying, H\. Su, A\. Anandkumar, J\. Song, and J\. Zhu \(2024\)DPOT: auto\-regressive denoising operator transformer for large\-scale PDE pre\-training\.InForty\-first International Conference on Machine Learning,Cited by:[§F\.2](https://arxiv.org/html/2605.30541#A6.SS2.SSS0.Px3.p1.5),[§F\.5](https://arxiv.org/html/2605.30541#A6.SS5.p1.2),[§5](https://arxiv.org/html/2605.30541#S5.p1.4)\.
- A\. Hatamizadeh, V\. Nath, Y\. Tang, D\. Yang, H\. R\. Roth, and D\. Xu \(2022\)Swin UNETR: swin transformers for semantic segmentation of brain tumors in MRI images\.InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries,pp\. 272–284\.Cited by:[§G\.2](https://arxiv.org/html/2605.30541#A7.SS2.SSS0.Px3.p1.1),[§6](https://arxiv.org/html/2605.30541#S6.p2.1)\.
- W\. Helland\-Hansen and G\. J\. Hampson \(2009\)Trajectory analysis: concepts and applications\.Basin Research21\(5\),pp\. 454–483\.Cited by:[§C\.2](https://arxiv.org/html/2605.30541#A3.SS2.p3.1)\.
- X\. Huang and T\. Alkhalifah \(2025\)Learned frequency\-domain scattered wavefield solutions using neural operators\.Geophysical Journal International241\(3\),pp\. 1467–1485\.Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2),[§1](https://arxiv.org/html/2605.30541#S1.p2.1)\.
- M\. R\. Hudec and M\. P\. A\. Jackson \(2006\)Advance of allochthonous salt sheets in passive margins and orogens\.AAPG Bulletin90\(10\),pp\. 1535–1564\.Cited by:[§C\.5](https://arxiv.org/html/2605.30541#A3.SS5.p1.1)\.
- M\. R\. Hudec and M\. P\. A\. Jackson \(2007\)Terra infirma: understanding salt tectonics\.Earth\-Science Reviews82\(1–2\),pp\. 1–28\.Cited by:[§C\.4](https://arxiv.org/html/2605.30541#A3.SS4.p1.1)\.
- M\. R\. Hudec, I\. O\. Norton, M\. P\. A\. Jackson, and F\. J\. Peel \(2013\)Jurassic evolution of the Gulf of Mexico salt basin\.AAPG Bulletin97\(10\),pp\. 1683–1710\.Cited by:[§C\.4](https://arxiv.org/html/2605.30541#A3.SS4.p1.1)\.
- F\. Isensee, T\. Wald, C\. Ulrich, M\. Baumgartner, S\. Roy, K\. Maier\-Hein, and P\. F\. Jaeger \(2024\)nnU\-Net revisited: a call for rigorous validation in 3D medical image segmentation\.Note:arXiv:2404\.09556Cited by:[§G\.2](https://arxiv.org/html/2605.30541#A7.SS2.SSS0.Px2.p1.1),[§6](https://arxiv.org/html/2605.30541#S6.p2.1)\.
- M\. P\. A\. Jackson and M\. R\. Hudec \(2017\)Salt tectonics: principles and practice\.Cambridge University Press\.Cited by:[§C\.4](https://arxiv.org/html/2605.30541#A3.SS4.p1.1)\.
- P\. Jin, Y\. Feng, S\. Feng, H\. Wang, Y\. Chen, B\. Consolvo, Z\. Liu, and Y\. Lin \(2024\)An empirical study of large\-scale data\-driven full waveform inversion\.Scientific Reports14\.Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2)\.
- I\. F\. Jones and I\. Davison \(2014\)Seismic imaging in and around salt bodies\.Interpretation2\(4\),pp\. SL1–SL20\.Cited by:[§C\.4](https://arxiv.org/html/2605.30541#A3.SS4.p1.1),[§C\.5](https://arxiv.org/html/2605.30541#A3.SS5.p1.1)\.
- A\. G\. Kidston, D\. E\. Brown, B\. M\. Smith, and B\. Altheim \(2002\)Hydrocarbon potential of the deep\-water Scotian slope\.Canada\-Nova Scotia Offshore Petroleum Board\.Cited by:[§C\.1](https://arxiv.org/html/2605.30541#A3.SS1.p1.1)\.
- J\. Kossaifi, N\. Kovachki, K\. Azizzadenesheli, and A\. Anandkumar \(2024\)Multi\-grid tensorized Fourier neural operator for high\-resolution PDEs\.Transactions on Machine Learning Research\.Cited by:[§F\.2](https://arxiv.org/html/2605.30541#A6.SS2.SSS0.Px1.p1.6),[§5](https://arxiv.org/html/2605.30541#S5.p1.4)\.
- G\. Kuhlmann, C\. G\. Langereis, D\. Munsterman, R\.\-J\. van Leeuwen, R\. Verreussel, J\. E\. Meulenkamp, and Th\. E\. Wong \(2006\)Integrated chronostratigraphy of the Pliocene\-Pleistocene interval and its relation to the regional stratigraphical stages in the southern North Sea region\.Netherlands Journal of Geosciences85\(1\),pp\. 19–35\.Cited by:[§C\.2](https://arxiv.org/html/2605.30541#A3.SS2.p1.1)\.
- P\. Lailly \(1983\)The seismic inverse problem as a sequence of before stack migrations\.InConference on Inverse Scattering: Theory and Application,Cited by:[§1](https://arxiv.org/html/2605.30541#S1.p1.1)\.
- S\. Li, Z\. Li, Z\. Mu, S\. Xin, Z\. Dai, K\. Leng, R\. Zhang, X\. Song, and Y\. Zhu \(2025\)GlobalTomo: a global dataset for physics\-ML seismic wavefield modeling and FWI\.InAdvances in Neural Information Processing Systems, Datasets and Benchmarks Track,Cited by:[§A\.1](https://arxiv.org/html/2605.30541#A1.SS1.p1.1)\.
- Z\. Li, N\. Kovachki, K\. Azizzadenesheli, B\. Liu, K\. Bhattacharya, A\. Stuart, and A\. Anandkumar \(2021\)Fourier neural operator for parametric partial differential equations\.InInternational Conference on Learning Representations,Cited by:[§F\.2](https://arxiv.org/html/2605.30541#A6.SS2.SSS0.Px1.p1.6)\.
- Y\. Liu and M\. K\. Sen \(2009\)A new time–space domain high\-order finite\-difference method for the acoustic wave equation\.Journal of Computational Physics228\(23\),pp\. 8779–8806\.Cited by:[§E\.3\.1](https://arxiv.org/html/2605.30541#A5.SS3.SSS1.p1.2)\.
- Z\. Liu, Y\. Lin, Y\. Cao, H\. Hu, Y\. Wei, Z\. Zhang, S\. Lin, and B\. Guo \(2021\)Swin transformer: hierarchical vision transformer using shifted windows\.InProceedings of the IEEE/CVF International Conference on Computer Vision,Cited by:[§G\.2](https://arxiv.org/html/2605.30541#A7.SS2.SSS0.Px3.p1.1),[§6](https://arxiv.org/html/2605.30541#S6.p2.1)\.
- T\. Lo and P\. L\. Inderwiesen \(1994\)Fundamentals of seismic tomography\.Geophysical Monograph Series,Society of Exploration Geophysicists\.Cited by:[§3\.2](https://arxiv.org/html/2605.30541#S3.SS2.SSS0.Px2.p1.1)\.
- M\. Louboutin, M\. Lange, F\. Luporini, N\. Kukreja, P\. A\. Witte, F\. J\. Herrmann, P\. Velesko, and G\. J\. Gorman \(2019\)Devito \(v3\.1\.0\): an embedded domain\-specific language for finite differences and geophysical exploration\.Geoscientific Model Development12\(3\),pp\. 1165–1187\.Cited by:[§E\.2](https://arxiv.org/html/2605.30541#A5.SS2.SSS0.Px2.p1.2),[§E\.5](https://arxiv.org/html/2605.30541#A5.SS5.p1.1),[2nd item](https://arxiv.org/html/2605.30541#S1.I1.i2.p1.1),[§4\.1](https://arxiv.org/html/2605.30541#S4.SS1.p1.3)\.
- G\. S\. Martin, R\. Wiley, and K\. J\. Marfurt \(2006\)Marmousi2: an elastic upgrade for marmousi\.The Leading Edge25\(2\),pp\. 156–166\.Cited by:[§A\.1](https://arxiv.org/html/2605.30541#A1.SS1.p1.1),[§1](https://arxiv.org/html/2605.30541#S1.p4.1)\.
- T\. P\. Merrifield, D\. P\. Griffith, S\. A\. Zamanian, S\. Gesbert, S\. Sen, J\. De La Torre Guzman, R\. D\. Potter, and H\. Kuehl \(2022\)Synthetic seismic data for training deep learning networks\.Interpretation10\(3\),pp\. SE31–SE39\.Cited by:[§A\.3](https://arxiv.org/html/2605.30541#A1.SS3.p1.1)\.
- R\. M\. Mitchum, P\. R\. Vail, and J\. B\. Sangree \(1977\)Seismic stratigraphy and global changes of sea level, part 6: stratigraphic interpretation of seismic reflection patterns in depositional sequences\.InSeismic Stratigraphy — Applications to Hydrocarbon Exploration,C\. E\. Payton \(Ed\.\),AAPG Memoir, Vol\.26,pp\. 117–133\.Cited by:[§C\.2](https://arxiv.org/html/2605.30541#A3.SS2.p3.1)\.
- C\. K\. Morley \(1999\)Patterns of displacement along large normal faults: implications for basin evolution and fault propagation, based on examples from East Africa\.AAPG Bulletin83\(4\),pp\. 613–634\.Cited by:[§C\.3](https://arxiv.org/html/2605.30541#A3.SS3.p1.1)\.
- L\. Mosser, O\. Dubrule, and M\. J\. Blunt \(2020\)Stochastic seismic waveform inversion using generative adversarial networks as a geological prior\.Mathematical Geosciences52\(1\),pp\. 53–79\.Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2),[§1](https://arxiv.org/html/2605.30541#S1.p2.1)\.
- A\. Nicol, J\. Watterson, J\. J\. Walsh, and C\. Childs \(1996\)The shapes, major axis orientations and displacement patterns of fault surfaces\.Journal of Structural Geology18\(2–3\),pp\. 235–248\.Cited by:[§C\.3](https://arxiv.org/html/2605.30541#A3.SS3.p1.1)\.
- I\. Overeem, G\. J\. Weltje, C\. Bishop\-Kay, and S\. B\. Kroonenberg \(2001\)The late cenozoic Eridanos delta system in the southern North Sea basin: a climate signal in sediment supply?\.Basin Research13\(3\),pp\. 293–312\.Cited by:[§C\.2](https://arxiv.org/html/2605.30541#A3.SS2.p1.1)\.
- D\. C\. P\. Peacock and D\. J\. Sanderson \(1994\)Geometry and development of relay ramps in normal fault systems\.AAPG Bulletin78\(2\),pp\. 147–165\.Cited by:[§C\.3](https://arxiv.org/html/2605.30541#A3.SS3.p1.1)\.
- F\. J\. Peel, C\. J\. Travis, and J\. R\. Hossack \(1995\)Genetic structural provinces and salt tectonics of the Cenozoic offshore U\.S\. Gulf of Mexico: a preliminary analysis\.InSalt Tectonics: A Global Perspective,M\. P\. A\. Jackson, D\. G\. Roberts, and S\. Snelson \(Eds\.\),AAPG Memoir, Vol\.65,pp\. 153–175\.Cited by:[§C\.4](https://arxiv.org/html/2605.30541#A3.SS4.p1.1)\.
- R\. S\. Pilcher, B\. Kilsdonk, and J\. Trude \(2011\)Primary basins and their boundaries in the deep\-water northern Gulf of Mexico: origin, trap types, and petroleum system implications\.AAPG Bulletin95\(2\),pp\. 219–240\.Cited by:[§C\.5](https://arxiv.org/html/2605.30541#A3.SS5.p1.1)\.
- N\. Ricker \(1953\)The form and laws of propagation of seismic wavelets\.Geophysics18\(1\),pp\. 10–40\.Cited by:[§2](https://arxiv.org/html/2605.30541#S2.p1.11)\.
- P\. Sava and B\. Biondi \(2004a\)Wave\-equation migration velocity analysis\. I\. Theory\.Geophysical Prospecting52\(6\),pp\. 593–606\.Cited by:[§C\.4](https://arxiv.org/html/2605.30541#A3.SS4.p1.1)\.
- P\. Sava and B\. Biondi \(2004b\)Wave\-equation migration velocity analysis\. II\. Subsalt imaging examples\.Geophysical Prospecting52\(6\),pp\. 607–623\.Cited by:[§C\.4](https://arxiv.org/html/2605.30541#A3.SS4.p1.1),[§C\.5](https://arxiv.org/html/2605.30541#A3.SS5.p1.1)\.
- A\. Schiemenz and H\. Igel \(2013\)Accelerated 3\-D full\-waveform inversion using simultaneously encoded sources in the time domain: application to Valhall ocean\-bottom cable data\.Geophysical Journal International195\(3\),pp\. 1970–1988\.Cited by:[§1](https://arxiv.org/html/2605.30541#S1.p1.1)\.
- \[65\]R\. M\. Silva, L\. Baroni, R\. S\. Ferreira, D\. Civitarese, D\. Szwarcman, and E\. V\. BrazilNetherlands dataset: a new public dataset for machine learning in seismic interpretation\.arXiv preprint arXiv:1904\.00770\.Cited by:[§C\.2](https://arxiv.org/html/2605.30541#A3.SS2.p1.1)\.
- Society of Exploration Geophysicists \(2024\)SEAM Open Data\.Note:[https://seg\.org/SEAM/open\-data/](https://seg.org/SEAM/open-data/)Cited by:[§C\.6](https://arxiv.org/html/2605.30541#A3.SS6.p1.1)\.
- J\. Stitt, R\. Clapp, and B\. Biondi \(2023\)Deep Dix: enhancing interval velocity model estimation through adversarial regularization\.InThird International Meeting for Applied Geoscience & Energy \(SEG IMAGE 2023\), Technical Program Expanded Abstracts,pp\. 108–112\.Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2),[§1](https://arxiv.org/html/2605.30541#S1.p2.1)\.
- J\. Stitt, R\. Clapp, and B\. Biondi \(2025\)Latent diffusion regularization on FWI\.Technical reportTechnical ReportSEP\-196,Stanford Exploration Project\.Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2),[§1](https://arxiv.org/html/2605.30541#S1.p2.1)\.
- W\. W\. Symes \(2007\)Reverse time migration with optimal checkpointing\.Geophysics72\(5\),pp\. SM213–SM221\.Cited by:[4th item](https://arxiv.org/html/2605.30541#S1.I1.i4.p1.1),[§5](https://arxiv.org/html/2605.30541#S5.p4.3)\.
- W\. W\. Symes \(2008\)Migration velocity analysis and waveform inversion\.Geophysical Prospecting56\(6\),pp\. 765–790\.Cited by:[§1](https://arxiv.org/html/2605.30541#S1.p1.1)\.
- A\. Tarantola \(1984\)Inversion of seismic reflection data in the acoustic approximation\.Geophysics49\(8\),pp\. 1259–1266\.Cited by:[§1](https://arxiv.org/html/2605.30541#S1.p1.1)\.
- J\. R\. Underhill and M\. A\. Partington \(1993\)Jurassic thermal doming and deflation in the North Sea: implications of the sequence stratigraphic evidence\.InPetroleum Geology of Northwest Europe: Proceedings of the 4th Conference,J\. R\. Parker \(Ed\.\),Geological Society, London, Petroleum Geology Conference Series, Vol\.4,pp\. 337–345\.Cited by:[§C\.3](https://arxiv.org/html/2605.30541#A3.SS3.p1.1)\.
- R\. Versteeg \(1994\)The Marmousi experience: Velocity model determination on a synthetic complex data set\.The Leading Edge13\(9\),pp\. 927–936\.Cited by:[§A\.1](https://arxiv.org/html/2605.30541#A1.SS1.p1.1),[§1](https://arxiv.org/html/2605.30541#S1.p4.1)\.
- J\. Virieux and S\. Operto \(2009\)An overview of full\-waveform inversion in exploration geophysics\.Geophysics74\(6\),pp\. WCC1–WCC26\.Cited by:[§1](https://arxiv.org/html/2605.30541#S1.p1.1)\.
- J\. A\. Wade and B\. C\. MacLean \(1990\)The geology of the southeastern margin of canada\.InGeology of the Continental Margin of Eastern Canada,M\. J\. Keen and G\. L\. Williams \(Eds\.\),The Geology of North America \(DNAG\), Vol\.I\-1,pp\. 167–238\.Cited by:[§C\.1](https://arxiv.org/html/2605.30541#A3.SS1.p1.1)\.
- F\. Wang, X\. Huang, and T\. Alkhalifah \(2023a\)A prior regularized full waveform inversion using generative diffusion models\.IEEE Transactions on Geoscience and Remote Sensing61,pp\. 1–11\.Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2),[§1](https://arxiv.org/html/2605.30541#S1.p2.1)\.
- H\. Wang, J\. Lin, X\. Dong, S\. Lu, Y\. Li, and B\. Yang \(2023b\)Seismic velocity inversion transformer\.Geophysics88\(4\),pp\. R513–R533\.Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2),[§1](https://arxiv.org/html/2605.30541#S1.p2.1)\.
- Y\. Wang \(2015\)The Ricker wavelet and the Lambert W function\.Geophysical Journal International200\(1\),pp\. 111–115\.Cited by:[§2](https://arxiv.org/html/2605.30541#S2.p1.11)\.
- Z\. Wang, A\. C\. Bovik, H\. R\. Sheikh, and E\. P\. Simoncelli \(2004\)Image quality assessment: from error visibility to structural similarity\.IEEE Transactions on Image Processing13\(4\),pp\. 600–612\.Cited by:[§6](https://arxiv.org/html/2605.30541#S6.p2.1)\.
- J\. A\. W\. Weissenberger, R\. A\. Wierzbicki, and N\. J\. Harland \(2006\)Carbonate sequence stratigraphy and petroleum geology of the Jurassic deep Panuke field, offshore Nova Scotia, Canada\.InGiant Hydrocarbon Reservoirs of the World: From Rocks to Reservoir Characterization and Modeling,P\. M\. Harris and L\. J\. Weber \(Eds\.\),AAPG Memoir, Vol\.88,pp\. 395–431\.Cited by:[§C\.1](https://arxiv.org/html/2605.30541#A3.SS1.p1.1)\.
- J\. F\. Wellmann, S\. T\. Thiele, M\. D\. Lindsay, and M\. W\. Jessell \(2016\)Pynoddy 1\.0: an experimental platform for automated 3\-D kinematic and potential field modelling\.Geoscientific Model Development9,pp\. 1019–1035\.Cited by:[§A\.3](https://arxiv.org/html/2605.30541#A1.SS3.p1.1)\.
- Y\. Wu and Y\. Lin \(2020\)InversionNet: an efficient and accurate data\-driven full waveform inversion\.IEEE Transactions on Computational Imaging6,pp\. 419–433\.Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2),[§G\.2](https://arxiv.org/html/2605.30541#A7.SS2.SSS0.Px1.p1.1),[§1](https://arxiv.org/html/2605.30541#S1.p2.1),[§6](https://arxiv.org/html/2605.30541#S6.p2.1)\.
- F\. Yang and J\. Ma \(2019\)Deep\-learning inversion: a next\-generation seismic velocity model building method\.Geophysics84\(4\),pp\. R583–R599\.Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2),[§1](https://arxiv.org/html/2605.30541#S1.p2.1)\.
- Y\. Yang, A\. F\. Gao, K\. Azizzadenesheli, R\. W\. Clayton, and Z\. E\. Ross \(2023\)Rapid seismic waveform modeling and inversion with neural operators\.IEEE Transactions on Geoscience and Remote Sensing61,pp\. 1–12\.Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2),[§1](https://arxiv.org/html/2605.30541#S1.p2.1)\.
- Y\. Yang, A\. F\. Gao, J\. C\. Castellanos, Z\. E\. Ross, K\. Azizzadenesheli, and R\. W\. Clayton \(2021\)Seismic wave propagation and inversion with neural operators\.The Seismic Record1\(3\),pp\. 126–134\.Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2),[§1](https://arxiv.org/html/2605.30541#S1.p2.1)\.
- G\. Yao, N\. V\. da Silva, M\. Warner, D\. Wu, and C\. Yang \(2019\)Tackling cycle skipping in full\-waveform inversion with intermediate data\.Geophysics84\(3\),pp\. R411–R427\.Cited by:[§1](https://arxiv.org/html/2605.30541#S1.p1.1)\.
- T\. Zhang, D\. Trad, and K\. Innanen \(2023\)Learning to solve the elastic wave equation with Fourier neural operators\.Geophysics88\(3\),pp\. T101–T119\.Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2),[§F\.2](https://arxiv.org/html/2605.30541#A6.SS2.p1.6),[§1](https://arxiv.org/html/2605.30541#S1.p2.1)\.
- Z\. Zhang and Y\. Lin \(2020\)Data\-driven seismic waveform inversion: a study on the robustness and generalization\.IEEE Transactions on Geoscience and Remote Sensing58\(10\),pp\. 6900–6913\.Cited by:[§A\.2](https://arxiv.org/html/2605.30541#A1.SS2.p1.2),[§1](https://arxiv.org/html/2605.30541#S1.p2.1)\.

## Appendix ARelated Work

We discuss related work along three threads: datasets used to train and evaluate ML\-based FWI \([Section˜A\.1](https://arxiv.org/html/2605.30541#A1.SS1)\), machine learning methods for seismic modeling and inversion \([Section˜A\.2](https://arxiv.org/html/2605.30541#A1.SS2)\), and software tools for building velocity models \([Section˜A\.3](https://arxiv.org/html/2605.30541#A1.SS3)\)\.

### A\.1Datasets for ML\-based FWI

Marmousi\[Versteeg,[1994](https://arxiv.org/html/2605.30541#bib.bib7), Martinet al\.,[2006](https://arxiv.org/html/2605.30541#bib.bib4)\], SEAM\[Fehler and Keliher,[2011](https://arxiv.org/html/2605.30541#bib.bib2)\], OpenFWI\[Denget al\.,[2022](https://arxiv.org/html/2605.30541#bib.bib1)\],𝔼FWI\\mathbf\{\\mathbb\{E\}^\{\\text\{FWI\}\}\}\[Fenget al\.,[2023](https://arxiv.org/html/2605.30541#bib.bib3)\], and GlobalTomo\[Liet al\.,[2025](https://arxiv.org/html/2605.30541#bib.bib82)\]are well\-known datasets for ML\-based FWI\. Marmousi, SEAM, and OpenFWI are the most closely related to SubsurfaceGen\. Each of these datasets falls short on at least one of the four properties from[Table˜1](https://arxiv.org/html/2605.30541#S1.T1)—field\-scale, geologically diverse, physically realistic, extendable—in the following respects:

- •Static datasets\.Existing datasets are fixed collections, which cannot support the experiments required for ML\-based FWI to progress\. Studying generalization across geological settings, scaling behavior with dataset size, and robustness to distribution shift all require the ability to procedurally generate new data with known properties\.
- •Limited spatial extent\.The majority of OpenFWI’s 2D velocity models span only 0\.7 km×\\times0\.7 km\. This is problematic because cycle skipping, illumination gaps, and low\-frequency recovery all worsen as spatial extent grows: methods evaluated only at small scales provide limited evidence of their performance on real\-world surveys\.
- •Limited temporal extent\.The majority of OpenFWI’s 2D shot gathers span only 1 s, which limits the maximum imaging depth to the upper 1–2 km of the subsurface, falling well short of the depths where carbon storage sites and hydrocarbons are often found\.
- •Limited geological diversity\.Marmousi captures complex folding and faulting, and SEAM captures salt bodies, but neither contains clinoforms, carbonate platforms, or velocity interbedding\. OpenFWI’s 2D families lack salt bodies, clinoforms, and velocity interbedding; its 3D family achieves realistic scale \(4 km×\\times4 km×\\times3\.5 km, 5 s of recording\) but represents a single geological scenario and therefore inherits the same diversity limitations\. These omissions correspond to the hardest and most application\-relevant regimes of FWI: salt bodies produce strong contrasts that are challenging to invert, carbonate platforms and clinoforms define the sedimentary settings targeted for carbon sequestration and hydrocarbon exploration, and velocity interbedding—thin alternating layers that are difficult to resolve—determines caprock integrity and reservoir quality\.
- •Lack of physical realism\.Many of OpenFWI’s 2D velocity models contain visible artifacts that make them physically unrealistic; models trained on them risk learning to reproduce these artifacts rather than geologically meaningful structure\.

𝔼FWI\\mathbf\{\\mathbb\{E\}^\{\\text\{FWI\}\}\}extends OpenFWI to the elastic wave equation, but inherits OpenFWI’s limited spatial/temporal extent and limited geological diversity\. GlobalTomo is intended for whole\-Earth seismology \(mantle structure, earthquake source mechanics, planetary tomography\) rather than the sedimentary settings targeted by SubsurfaceGen\.

SubsurfaceGen addresses all of the limitations above, since it enables procedural generation of field\-scale, geologically diverse, physically realistic 3D velocity models and seismic data\.

### A\.2Machine Learning for Seismic Modeling and Inversion

Wavefield prediction with neural operators\[Yanget al\.,[2021](https://arxiv.org/html/2605.30541#bib.bib64),[2023](https://arxiv.org/html/2605.30541#bib.bib66), Zhanget al\.,[2023](https://arxiv.org/html/2605.30541#bib.bib65), Huang and Alkhalifah,[2025](https://arxiv.org/html/2605.30541#bib.bib67)\]has been restricted to small domains \(less than one square kilometer\) with unrealistic geology \(random textures or simple shapes instead of layered, faulted, or salt\-containing models\) and acquisition geometries unlike real surface surveys \(e\.g\., putting sources on the entire perimeter of the model\)\. SubsurfaceGen provides the data needed to train and evaluate these operators in production\-relevant settings, as shown in[Section˜5](https://arxiv.org/html/2605.30541#S5)\. A second line of work learns the inverse map from shot gathers directly to velocity models using neural networks\[Araya\-Poloet al\.,[2018](https://arxiv.org/html/2605.30541#bib.bib69), Yang and Ma,[2019](https://arxiv.org/html/2605.30541#bib.bib70), Wu and Lin,[2020](https://arxiv.org/html/2605.30541#bib.bib58), Zhang and Lin,[2020](https://arxiv.org/html/2605.30541#bib.bib71), Wanget al\.,[2023b](https://arxiv.org/html/2605.30541#bib.bib72)\]\.Jinet al\.\[[2024](https://arxiv.org/html/2605.30541#bib.bib73)\]scales this approach to 408,000 samples, but is restricted to the small spatial extent of OpenFWI \(0\.7 km\)\. OnlyFarriset al\.\[[2023](https://arxiv.org/html/2605.30541#bib.bib37)\], Farris \[[2023](https://arxiv.org/html/2605.30541#bib.bib38)\]has trained end\-to\-end inversion at field scale—on∼\\sim17 km×\\times12 km models in the Gulf of Mexico, using a data engine that was never publicly released\. SubsurfaceGen \(and the included field\-scale dataset\) fills this void, and we train inversion baselines on it in[Section˜6](https://arxiv.org/html/2605.30541#S6)\. A third line of work uses generative models trained on velocity models as data\-driven priors for FWI\[Mosseret al\.,[2020](https://arxiv.org/html/2605.30541#bib.bib74), Stittet al\.,[2023](https://arxiv.org/html/2605.30541#bib.bib76), Wanget al\.,[2023a](https://arxiv.org/html/2605.30541#bib.bib75), Stittet al\.,[2025](https://arxiv.org/html/2605.30541#bib.bib77)\]; SubsurfaceGen provides a distribution of geologically plausible velocity models required for training these priors\.

### A\.3Model Building Tools

Structural geomodeling tools like pynoddy\[Wellmannet al\.,[2016](https://arxiv.org/html/2605.30541#bib.bib98)\], GemPy\[de la Vargaet al\.,[2019](https://arxiv.org/html/2605.30541#bib.bib97)\], and LoopStructural\[Groseet al\.,[2021](https://arxiv.org/html/2605.30541#bib.bib99)\]are unable to encode the pixel\-level velocity heterogeneity \(fractal noise, fine interbedding\) that defines the sedimentary basin settings SubsurfaceGen targets\. Moreover, they lack a pipeline to generate training data pairs of velocity models and seismic data\. The closest peer to SubsurfaceGen is Synthoseis\[Merrifieldet al\.,[2022](https://arxiv.org/html/2605.30541#bib.bib100)\], a CPU\-based pipeline that produces seismic data using 1D reflectivity convolution rather than full wave physics, so the resulting traces lack important phenomena like multipathing and chaotic late\-time interference\. Synthoseis also omits modules for realistic salt geometries, carbonate platforms, and clinoforms\. SubsurfaceGen closes these gaps: it is GPU\-enabled, contains salt, carbonate platform, and clinoform modules, and produces seismic data by solving the acoustic wave equation\.

## Appendix BVelocity Model Builder: Modules

This appendix gives algorithmic details for the eight builder modules introduced in[Section˜3](https://arxiv.org/html/2605.30541#S3)\(summarized in[Table˜5](https://arxiv.org/html/2605.30541#A2.T5)\)\. Each module is described in its own subsection \([Sections˜B\.1](https://arxiv.org/html/2605.30541#A2.SS1),[B\.2](https://arxiv.org/html/2605.30541#A2.SS2),[B\.3](https://arxiv.org/html/2605.30541#A2.SS3),[B\.4](https://arxiv.org/html/2605.30541#A2.SS4),[B\.5](https://arxiv.org/html/2605.30541#A2.SS5),[B\.6](https://arxiv.org/html/2605.30541#A2.SS6),[B\.7](https://arxiv.org/html/2605.30541#A2.SS7)and[B\.8](https://arxiv.org/html/2605.30541#A2.SS8)\)\. The companion appendices show how these modules are composed into each geological setting \([Appendix˜C](https://arxiv.org/html/2605.30541#A3)\) and a CPU vs\. GPU timing benchmark of the model builder \([Appendix˜D](https://arxiv.org/html/2605.30541#A4)\)\.This section leans heavily on geological terminology\.

Table 5:Catalog of modules in the SubsurfaceGen model builder\. Each row describes a module using standard geological terminology\.##### Notation\.

VVdenotes the velocity fieldvp​\(𝐱\)v\_\{p\}\(\\mathbf\{x\}\)in m/s on a\(Nz,Ny,Nx\)\(N\_\{z\},N\_\{y\},N\_\{x\}\)grid;LLis an integer\-valued volume of the same shape recording which geologic layer each voxel belongs to\. The pseudocode below shows updates toVVexplicitly; updates toLLhappen alongsideVVin the implementation but are omitted for clarity\. SubsurfaceGen bundlesVVandLLin a single data structure\.

### B\.1Deposit

TheDepositmodule \([Algorithm˜1](https://arxiv.org/html/2605.30541#alg1)\) adds a sedimentary package with interbeds \(thin layers of contrasting velocity\) on top of a model\. Interbed boundaries are 2D surfaces sampled from a 3D simplex noise field \(a smooth, spatially correlated random field\); each bed is assigned a single Gaussian\-drawn velocity with a global depth gradient added across the slab, and an additional 3D simplex noise field provides intra\-bed spatial texture \(with a per\-bed randomzz\-offset into the shared field decorrelating textures across beds\)\. Layer assignment is vectorized viatorch\.searchsorted\. An optional sinusoidal bedform mode \(interbed\_mode = "sinusoidal"\) bypasses the boundary\-based path and generates the slab as a sum of two sinusoidal harmonics evaluated on azz\-warped axis \(depth axis non\-uniformly stretched so bed thickness varies; built as the cumsum of a smoothed instantaneous frequency\) with added rugosity \(small\-amplitude roughness\), plus an additive 3D patchy\-noise texture\. Unit convention: the user\-facing config specifiesh¯\\bar\{h\}andσh\\sigma\_\{h\}in meters; internally these are divided by the vertical discretizationΔ​z\\Delta zto yield the per\-cell quantities used in the formulas below\.

Algorithm 1Deposit\.apply\(sedimentary package with interbeds\)1:Thickness

tt, base velocity

v0v\_\{0\}, depth gradient

β\\beta, mean interbed thickness

h¯\\bar\{h\}, interbed\-thickness std

σh\\sigma\_\{h\}, per\-bed velocity std

σp\\sigma\_\{p\}, boundary\-noise amplitude

aBa\_\{B\}, texture amplitude

aηa\_\{\\eta\}, taper

τ​\(z\)=min⁡\(z/\(ftop​t\),1\)⋅min⁡\(\(t−z\)/\(fbase​t\),1\)\\tau\(z\)=\\min\(z/\(f\_\{\\text\{top\}\}t\),1\)\\cdot\\min\(\(t\-z\)/\(f\_\{\\text\{base\}\}t\),1\)\.

2:Subdivide the slab into

n≈⌊t/h¯⌋n\\approx\\lfloor t/\\bar\{h\}\\rfloorthin beds: draw thicknesses

∼𝒩​\(h¯,σh2\)\\sim\\mathcal\{N\}\(\\bar\{h\},\\sigma\_\{h\}^\{2\}\)\(clamped to

≥12\\geq\\tfrac\{1\}\{2\}sample\), cumsum into internal boundary depths

\{Hi\}i=1n−1\\\{H\_\{i\}\\\}\_\{i=1\}^\{n\-1\}\.

3:

Hi​\(y,x\)←Hi\+τ​\(Hi\)​aB​η\(i\)​\(y,x\)H\_\{i\}\(y,x\)\\leftarrow H\_\{i\}\+\\tau\(H\_\{i\}\)\\,a\_\{B\}\\,\\eta^\{\(i\)\}\(y,x\), clamped to

\[0,t−1\]\[0,t\-1\]\.

4:Draw per\-bed velocities

v1,…,vn∼𝒩​\(v0,σp2\)v\_\{1\},\\ldots,v\_\{n\}\\sim\\mathcal\{N\}\(v\_\{0\},\\sigma\_\{p\}^\{2\}\); let

v¯=1n​∑ℓvℓ\\bar\{v\}=\\tfrac\{1\}\{n\}\\sum\_\{\\ell\}v\_\{\\ell\}\.

5:foreach voxel

\(z,y,x\)\(z,y,x\)in the deposit banddo

6:

ℓ←searchsorted​\(\{Hi​\(y,x\)\},z\)\\ell\\leftarrow\\textsc\{searchsorted\}\(\\\{H\_\{i\}\(y,x\)\\\},z\);

SℓS\_\{\\ell\}is bed

ℓ\\ell’s start depth\.

7:

V​\[z,y,x\]=\(v¯\+β​z\)\+τ​\(z\)​\[\(vℓ−v¯\)\+aη​η3D​\(z−Sℓ,y,x\)\]V\[z,y,x\]=\(\\bar\{v\}\+\\beta z\)\+\\tau\(z\)\\big\[\(v\_\{\\ell\}\-\\bar\{v\}\)\+a\_\{\\eta\}\\,\\eta\_\{\\text\{3D\}\}\(z\-S\_\{\\ell\},y,x\)\\big\]\.

8:endfor

9:Per\-column sinc\-resample

VVvertically to conform to the underlying topography

h​\(y,x\)h\(y,x\)\.

### B\.2Squish

TheSquishmodule \([Algorithm˜2](https://arxiv.org/html/2605.30541#alg2)\) applies a vertical displacement to a model, emulating basin\-scale warping and long\-wavelength folding without introducing faulting behavior\. The displacement is a 2D fractal noise field with configurable maximum shift, spatial wavelengths, azimuth, and octave count, which is applied to every voxel in the column via sinc resampling\.

Algorithm 2Squish\.apply\(vertical warping\)1:Maximum shift

Δ​zmax\\Delta z\_\{\\text\{max\}\}, spatial wavelengths

\(λy,λx\)\(\\lambda\_\{y\},\\lambda\_\{x\}\), azimuth

α\\alpha, octave count

noctn\_\{\\text\{oct\}\}\.

2:Sample a 2D fractal\-noise displacement field

Δ​z​\(y,x\)\\Delta z\(y,x\)with amplitude

Δ​zmax\\Delta z\_\{\\text\{max\}\}, wavelengths

\(λy,λx\)\(\\lambda\_\{y\},\\lambda\_\{x\}\)rotated by

α\\alpha, summed over

noctn\_\{\\text\{oct\}\}octaves\.

3:Pull\-sample

VVat

\(z\+Δ​z​\(y,x\),y,x\)\(z\+\\Delta z\(y,x\),y,x\)via sinc interpolation\.

### B\.3Fault

TheFaultmodule \([Algorithm˜3](https://arxiv.org/html/2605.30541#alg3)\) applies listric normal\-fault deformation to a model\. It does this through building a smooth 3D displacement field and then warping the model\. Faults are parameterized by a center, azimuth, dip angle, curvature radius, slip magnitude, direction, and die\-off extents\. Each voxel in the post\-fault model is pull\-sampled from its pre\-slip position\. The fault geometry is represented as a cylindrical coordinate system where a large cylinder radius gives a planar fault while a small radius creates a curved listric fault \(l\-shaped fault\)\.

[Algorithm˜3](https://arxiv.org/html/2605.30541#alg3)represents fault slip or displacement as a rotation around this cylinder, with cosine tapers controlling how the displacement dies off across the fault, along strike, and near the ends of the rupture\. For each output voxel, the code computes where that voxel came from in the original unfaulted model, then samples the original model at that source location\. Continuous fields such as velocity are resampled with sinc interpolation, while integer fields such as layer labels are resampled with nearest\-neighbor interpolation\. The implementation also supports rough fault surfaces by perturbing the fault radius with fractal noise, and can generate smaller subsidiary faults around a primary fault using simplified Riedel\-shear geometry\.

Algorithm 3Fault\.apply\(listric normal\-fault deformation, single fault\)1:Fault center

cc, strike azimuth

α\\alpha, dip

θ\\theta, curvature radius

RR, slip

ss, direction

d∈\{\+1,−1\}d\\in\\\{\+1,\-1\\\}, die\-off extents

\(ddie,dperp,θdie\)\(d\_\{\\text\{die\}\},d\_\{\\text\{perp\}\},\\theta\_\{\\text\{die\}\}\)\.

2:Cylinder center

c~=c−\(δaz​sin⁡α,δaz​cos⁡α,δz\)\\tilde\{c\}=c\-\(\\delta\_\{\\text\{az\}\}\\sin\\alpha,\\ \\delta\_\{\\text\{az\}\}\\cos\\alpha,\\ \\delta\_\{z\}\)with

\(δz,δaz\)=\(R​sin⁡θ,R​cos⁡θ\)\(\\delta\_\{z\},\\delta\_\{\\text\{az\}\}\)=\(R\\sin\\theta,R\\cos\\theta\); reference angle

θ0=atan2⁡\(δz,δaz\)\\theta\_\{0\}=\\operatorname\{atan2\}\(\\delta\_\{z\},\\delta\_\{\\text\{az\}\}\)\.

3:Slip as rotation:

θshift=s/\(2​π​R\)⋅360∘\\theta\_\{\\text\{shift\}\}=s/\(2\\pi R\)\\cdot 360^\{\\circ\}\.

4:foreach voxel

\(z,y,x\)\(z,y,x\)do

5:Cylindrical coordinates of

\(z,y,x\)−c~\(z,y,x\)\-\\tilde\{c\}in the fault\-aligned frame: radius

rr, angle

θold\\theta\_\{\\text\{old\}\}, along\-strike offset

azstrike\\text\{az\}\_\{\\text\{strike\}\}\.

6:Die\-off ratios

ρr=\|r−R\|/ddie\\rho\_\{r\}=\|r\-R\|/d\_\{\\text\{die\}\},

ρθ=\|θold−θ0\|/θdie\\rho\_\{\\theta\}=\|\\theta\_\{\\text\{old\}\}\-\\theta\_\{0\}\|/\\theta\_\{\\text\{die\}\},

ρs=\|azstrike\|/dperp\\rho\_\{s\}=\|\\text\{az\}\_\{\\text\{strike\}\}\|/d\_\{\\text\{perp\}\}\.

7:Taper

w​\(x\)=cos⁡\(π2​ρr\)​cos⁡\(π2​ρθ\)​cos⁡\(π2​ρs\)​𝟏​\{ρr,ρθ,ρs<1\}w\(x\)=\\cos\(\\tfrac\{\\pi\}\{2\}\\rho\_\{r\}\)\\cos\(\\tfrac\{\\pi\}\{2\}\\rho\_\{\\theta\}\)\\cos\(\\tfrac\{\\pi\}\{2\}\\rho\_\{s\}\)\\mathbf\{1\}\\\{\\rho\_\{r\},\\rho\_\{\\theta\},\\rho\_\{s\}<1\\\}\.

8:Side

χ=\+1\\chi=\+1if

r≥Rr\\geq R\(footwall\), else

χ=−1\\chi=\-1\(hanging wall\);

θnew=θold−χ​d​w​\(x\)​θshift\\theta\_\{\\text\{new\}\}=\\theta\_\{\\text\{old\}\}\-\\chi\\,d\\,w\(x\)\\,\\theta\_\{\\text\{shift\}\}\.

9:Forward\-mapped position

pfwd​\(z,y,x\)←p\_\{\\text\{fwd\}\}\(z,y,x\)\\leftarrowinverse cylindrical transform of

\(r,θnew,azstrike\)\(r,\\theta\_\{\\text\{new\}\},\\text\{az\}\_\{\\text\{strike\}\}\)\.

10:Pull source

src​\(z,y,x\)=2​\(z,y,x\)−pfwd​\(z,y,x\)\\text\{src\}\(z,y,x\)=2\(z,y,x\)\-p\_\{\\text\{fwd\}\}\(z,y,x\)\(Cartesian\-linearized inverse, exact to first order in

θshift\\theta\_\{\\text\{shift\}\}\)\.

11:endfor

12:Pull\-sample

VVatsrcvia sinc; voxels with

srcz∉\[0,Nz\)\\text\{src\}\_\{z\}\\notin\[0,N\_\{z\}\)get fill values\.

### B\.4SaltSDT

TheSaltSDTmodule \([Algorithm˜4](https://arxiv.org/html/2605.30541#alg4)\) generates irregular 3D salt bodies, similar to high\-velocity salt structures in the Gulf of Mexico using a signed distance transform perturbed by a Gaussian random field \(GRF\)\. Each body starts from a randomly placed and rotated ellipsoid, which provides a simple controllable shape for the salt core\. The ellipsoid mask is converted into a signed distance field, so the salt boundary can be modified smoothly by adding a spatially correlated Gaussian random field\. The correlation lengths control the preferred scale and direction of boundary variations, while the perturbation amplitude controls how far the final body deviates from the original ellipsoid\. Smaller spectral exponents preserve more short\-wavelength roughness, whereas larger values produce smoother salt boundaries\. After all bodies are generated, their masks are merged, cleaned with morphological opening and closing, and small isolated components are removed\. The final mask is prevented from overwriting water and is then assigned the configured salt velocity\.

Algorithm 4SaltSDT\.apply\(signed distance transform salt bodies\)1:Body count

nbn\_\{b\}, placement bounds, base radius range, anisotropic correlation length fractions

\(fℓx,fℓy,fℓz\)\(f\_\{\\ell\_\{x\}\},f\_\{\\ell\_\{y\}\},f\_\{\\ell\_\{z\}\}\), GRF amplitude

aa, spectral exponent

pp, salt velocity

vsv\_\{s\}, water threshold

vwaterv\_\{\\text\{water\}\}\.

2:

K←𝟎K\\leftarrow\\mathbf\{0\}\.

3:for

i=1,…,nbi=1,\\ldots,n\_\{b\}do

4:Sample center, per\-axis radii

\(rx,ry,rz\)\(r\_\{x\},r\_\{y\},r\_\{z\}\), rotation angles; let

r¯=\(rx\+ry\+rz\)/3\\bar\{r\}=\(r\_\{x\}\+r\_\{y\}\+r\_\{z\}\)/3\.

5:Rasterize rotated ellipsoid; compute signed distance

ϕ\(i\)\\phi^\{\(i\)\}via Meijster Euclidean distance transform\.

6:Sample GRF

ξ\(i\)\\xi^\{\(i\)\}from power spectrum

P​\(𝐤\)∝\(1\+∑k\(kk​ℓk\)2\)−pP\(\\mathbf\{k\}\)\\propto\\big\(1\+\\sum\_\{k\}\(k\_\{k\}\\ell\_\{k\}\)^\{2\}\\big\)^\{\-p\}with

ℓk=max⁡\(rk​fℓk,5\)\\ell\_\{k\}=\\max\(r\_\{k\}f\_\{\\ell\_\{k\}\},5\)\.

7:

ϕ′⁣\(i\)​\(x\)=ϕ\(i\)​\(x\)\+a​r¯​ξ\(i\)​\(x\)\\phi^\{\\prime\(i\)\}\(x\)=\\phi^\{\(i\)\}\(x\)\+a\\,\\bar\{r\}\\,\\xi^\{\(i\)\}\(x\);

Ki=𝟏​\{ϕ′⁣\(i\)\>0\}K\_\{i\}=\\mathbf\{1\}\\\{\\phi^\{\\prime\(i\)\}\>0\\\};

K←K∨KiK\\leftarrow K\\lor K\_\{i\}\.

8:endfor

9:Morphologically open then close

KK; drop connected components smaller than 1000 voxels\.

10:

K←K∧\(V≥vwater\)K\\leftarrow K\\land\(V\\geq v\_\{\\text\{water\}\}\);

V​\[K\]←vsV\[K\]\\leftarrow v\_\{s\}\.

The geometric structure of the salt bodies is controlled by two sets of parameters:\(ℓx,ℓy,ℓz\)\(\\ell\_\{x\},\\ell\_\{y\},\\ell\_\{z\}\)\(anisotropy of the boundary correlation length\) and\(a,p\)\(a,p\)\(amplitude and spectral falloff of the perturbation\): increasingℓz\\ell\_\{z\}relative toℓx\\ell\_\{x\}elongates bodies vertically \(as in the Penobscot edge diapir\); decreasingppraises high\-frequency roughness on the body boundary; increasingaapushes bodies further from their base ellipsoid\.

### B\.5SaltWedge

TheSaltWedgemodule \([Algorithm˜5](https://arxiv.org/html/2605.30541#alg5)\) deforms the sediment column in the neighborhood of salt bodies \(produced bySaltSDT\) to generate flanking onlap and drape geometries consistent with the halokinetic sequence framework ofGiles and Rowan \[[2012](https://arxiv.org/html/2605.30541#bib.bib32)\]\. The goal is not to simulate salt mechanics directly, but to create the common seismic appearance of sediment layers bending, draping, or onlapping near salt bodies\. The standard mode computes larger upward shifts near salt and smaller shifts farther away\. This shift is strongest near steep salt flanks, decays with distance from the salt, can taper with depth, and is modulated by smooth fractal noise so the deformation is not perfectly symmetric\. An optional near\-salt squeeze component compresses layers close to the salt flank when its strength is set above zero \(off by default\)\. An optional top\-conforming mode separately detects the top of the salt body and pushes overlying sediments upward above salt crests, producing diapir\-like drape\. After the drag, optional squeeze, and optional top\-conforming components are combined, the displacement field is clamped to nonnegative values, optionally smoothed in the horizontal plane, made monotone with depth \(enabled by default\), and then applied as a vertical\-only pull warp\. Salt voxels themselves are preserved at the configured salt velocity\.

Algorithm 5SaltWedge\.apply\(sediment deformation around salt bodies\)1:Salt mask

KK, base influence

R0R\_\{0\}, flank\-influence

RflankR\_\{\\text\{flank\}\}, drag exponent

pp, vertical\-flank threshold

τ\\tau, depth taper

g​\(z\)g\(z\)on

\(ztop,zbot,gpow\)\(z\_\{\\text\{top\}\},z\_\{\\text\{bot\}\},g\_\{\\text\{pow\}\}\), base amplitude

AfracA\_\{\\text\{frac\}\}, conform flag and parameters

\(Rconform,cuplift,sconf\)\(R\_\{\\text\{conform\}\},c\_\{\\text\{uplift\}\},s\_\{\\text\{conf\}\}\)\.

2:Signed distance

ϕ​\(x\)=dout​\(x;K\)−din​\(x;K\)\\phi\(x\)=d\_\{\\text\{out\}\}\(x;K\)\-d\_\{\\text\{in\}\}\(x;K\); vertical normal

n^z=∂zϕ/‖∇ϕ‖\\hat\{n\}\_\{z\}=\\partial\_\{z\}\\phi/\\\|\\nabla\\phi\\\|\.

3:Vertical\-flank set

Vflank=\{x∈∂K:\|n^z​\(x\)\|<τ\}V\_\{\\text\{flank\}\}=\\\{x\\in\\partial K:\|\\hat\{n\}\_\{z\}\(x\)\|<\\tau\\\};

dvert​\(x\)=d\_\{\\text\{vert\}\}\(x\)=EDT distance to

VflankV\_\{\\text\{flank\}\}\.

4:Drag weights

wd​\(x\)=\(max⁡\(0,1−ϕ/Rfield​\(z\)\)\)pw\_\{d\}\(x\)=\\big\(\\max\(0,1\-\\phi/R\_\{\\text\{field\}\}\(z\)\)\\big\)^\{p\},

wv​\(x\)=exp⁡\(−\(dvert/Rflank\)2\)w\_\{v\}\(x\)=\\exp\(\-\(d\_\{\\text\{vert\}\}/R\_\{\\text\{flank\}\}\)^\{2\}\),

w=wd​wv​𝟏​\{x∉K\}w=w\_\{d\}w\_\{v\}\\mathbf\{1\}\\\{x\\notin K\\\}, with

Rfield​\(z\)=R0​\(1\+βrad​sin⁡\(π​z/Nz\)\)R\_\{\\text\{field\}\}\(z\)=R\_\{0\}\(1\+\\beta\_\{\\text\{rad\}\}\\sin\(\\pi z/N\_\{z\}\)\)\.

5:Drag displacement

Δ​zdrag​\(x\)=Afrac​Nz​w​\(x\)​g​\(z\)​A​\(x\)\\Delta z\_\{\\text\{drag\}\}\(x\)=A\_\{\\text\{frac\}\}\\,N\_\{z\}\\,w\(x\)\\,g\(z\)\\,A\(x\), with

A​\(x\)A\(x\)a fractal amplitude field\.

6:ifconform enabledthen

7:Salt\-top

ζtop​\(y,x\)=min⁡\{z:K​\[z,y,x\]=1\}\\zeta\_\{\\text\{top\}\}\(y,x\)=\\min\\\{z:K\[z,y,x\]=1\\\}; crest relief

Rcrest=max⁡\(0,Gaussσ​\(ζtop\)−ζtop\)R\_\{\\text\{crest\}\}=\\max\(0,\\mathrm\{Gauss\}\_\{\\sigma\}\(\\zeta\_\{\\text\{top\}\}\)\-\\zeta\_\{\\text\{top\}\}\), normalized to

\[0,1\]\[0,1\]\.

8:

Δ​zconf​\(x\)=cuplift​Nz​sconf​Rcrest​\(y,x\)​e−\(ζtop−z\)/Rconform​g​\(z\)​𝟏​\{z<ζtop,x∉K\}\\Delta z\_\{\\text\{conf\}\}\(x\)=c\_\{\\text\{uplift\}\}N\_\{z\}s\_\{\\text\{conf\}\}\\,R\_\{\\text\{crest\}\}\(y,x\)\\,e^\{\-\(\\zeta\_\{\\text\{top\}\}\-z\)/R\_\{\\text\{conform\}\}\}\\,g\(z\)\\mathbf\{1\}\\\{z<\\zeta\_\{\\text\{top\}\},x\\notin K\\\}\.

9:else

10:

Δ​zconf←0\\Delta z\_\{\\text\{conf\}\}\\leftarrow 0\.

11:endif

12:

Δ​z​\(x\)=max⁡\(0,Δ​zdrag\+Δ​zconf\)\\Delta z\(x\)=\\max\(0,\\Delta z\_\{\\text\{drag\}\}\+\\Delta z\_\{\\text\{conf\}\}\); enforce monotonicity

Δ​z​\[z,y,x\]←maxz′≤z⁡Δ​z​\[z′,y,x\]\\Delta z\[z,y,x\]\\leftarrow\\max\_\{z^\{\\prime\}\\leq z\}\\Delta z\[z^\{\\prime\},y,x\]\.

13:Warp

VVby pull\-sampling at

\(z\+Δ​z,y,x\)\(z\+\\Delta z,y,x\)with

zz\-linear interpolation; salt voxels held at

vsv\_\{s\}\.

### B\.6CarbonatePlatform

TheCarbonatePlatformmodule \([Algorithm˜6](https://arxiv.org/html/2605.30541#alg6)\) builds asymmetric reef bodies with a steep marine flank, a gentle leeward ramp, and a “chaotic” core\. Multiple platforms along a shared strike direction are merged via a smooth\-minimum signed distance operator with a configurable blending parameter\. Facies velocities \(core, marine forereef, lagoon, carbonate basin, sequence boundary\) are assigned separately\. A lateral fill texture with interbedded velocities and a patchy 3D noise field fills the surrounding sediment bounded by the platform contacts\.

Algorithm 6CarbonatePlatform\.apply\(asymmetric reef body with facies zoning\)1:Thickness

tuset\_\{\\text\{use\}\}, platform azimuth

α\\alpha, center count

NpN\_\{p\}, width fraction

fwf\_\{w\}, shape exponent

κ\\kappa, marine squash

μ\\mu, leeward stretch

λ\\lambda, growth phases

nphasesn\_\{\\text\{phases\}\}, smooth\-min parameter

kk, facies velocities, lateral\-fill fraction

ffillf\_\{\\text\{fill\}\}\.

2:Domain\-warp the horizontal grid:

\(yw,xw\)=\(y,x\)\+aw​\(ηy,ηx\)\(y\_\{w\},x\_\{w\}\)=\(y,x\)\+a\_\{w\}\(\\eta\_\{y\},\\eta\_\{x\}\)\.

3:Sample

NpN\_\{p\}platform\-center paths

\(cy\(j\)​\(t\),cx\(j\)​\(t\)\)\(c\_\{y\}^\{\(j\)\}\(t\),c\_\{x\}^\{\(j\)\}\(t\)\)as oscillation \+ smoothed walk \+ drift along

α\\alpha\.

4:For each platform

jj: rotate into marine\-strike frame, split marine vs leeward by

μ,λ\\mu,\\lambda, and form normalized distance

distj​\(y,x\)=\(dmarasym\)2\+dstr2/reff\(j\)\\text\{dist\}\_\{j\}\(y,x\)=\\sqrt\{\(d\_\{\\text\{mar\}\}^\{\\text\{asym\}\}\)^\{2\}\+d\_\{\\text\{str\}\}^\{2\}\}/r\_\{\\text\{eff\}\}^\{\(j\)\}\.

5:Blend platforms via smooth\-minimum:

distblend=smink⁡\(dist1,…,distNp\)\\text\{dist\}\_\{\\text\{blend\}\}=\\operatorname\{smin\}\_\{k\}\(\\text\{dist\}\_\{1\},\\ldots,\\text\{dist\}\_\{N\_\{p\}\}\)\.

6:Build 2D thickness map

T​\(y,x\)T\(y,x\)from

nphasesn\_\{\\text\{phases\}\}growth lenses

exp⁡\(−distblendκ\)\\exp\(\-\\text\{dist\}\_\{\\text\{blend\}\}^\{\\kappa\}\), normalized to

\[min\_frac,1\]​tuse\[\\text\{min\\\_frac\},1\]\\,t\_\{\\text\{use\}\}\.

7:Build unmorphed volume of shape

mmound×Ny×Nxm\_\{\\text\{mound\}\}\\times N\_\{y\}\\times N\_\{x\}: assign facies velocity from

\{vcore,vfore,vlag,vbasin\}\\\{v\_\{\\text\{core\}\},v\_\{\\text\{fore\}\},v\_\{\\text\{lag\}\},v\_\{\\text\{basin\}\}\\\}via

\(distblend,bias\)\(\\text\{dist\}\_\{\\text\{blend\}\},\\text\{bias\}\)where

biasj=12​\(dmar/\|d\|\+1\)\\text\{bias\}\_\{j\}=\\tfrac\{1\}\{2\}\(d\_\{\\text\{mar\}\}/\|d\|\+1\)marks each voxel marine\-vs\-leeward; add

nibn\_\{\\text\{ib\}\}interbeds and core texture\.

8:Per\-column linearly resample to

T​\(y,x\)T\(y,x\)samples and place into

z∈\[h−T,h\)z\\in\[h\-T,h\)\.

9:Off\-platform columns in

\[h−ffill​tuse,h\)\[h\-f\_\{\\text\{fill\}\}\\,t\_\{\\text\{use\}\},h\)get a low\-velocity interbed fill\.

### B\.7DeltaClinoformDeposit

TheDeltaClinoformDepositmodule \([Algorithm˜7](https://arxiv.org/html/2605.30541#alg7)\) generates forestepping deltaic packages as a stack of flatNcN\_\{c\}clinothem envelopes, each containingBBinternal beds\. Internal beds are then created by interpolating between the previous surface and the clinothem envelope, producing thin layers that follow the same folded\-over clinoform geometry\. The module can optionally vary the rollover position, extent, truncation, and onlap taper from one clinothem to the next using a prescribed sequence of systems\-tract labels, but these labels act only as geometric controls on the stacked surfaces rather than as a full process\-based depositional model\.

Algorithm 7DeltaClinoformDeposit\.apply\(forestepping clinothem package\)1:Basinward azimuth

α\\alpha, clinothem count

NcN\_\{c\}, beds per clinothem

BB, envelope sharpness

\(sfore,stoe\)\(s\_\{\\text\{fore\}\},s\_\{\\text\{toe\}\}\), flat\-topset fraction

ϕflat\\phi\_\{\\text\{flat\}\}, toe parameters

\(ut,0,dtoe,βtoe\)\(u\_\{t,0\},d\_\{\\text\{toe\}\},\\beta\_\{\\text\{toe\}\}\), rollover advance

Δ​ur\\Delta u\_\{r\}, optional systems\-tract sequence, lobe shape, facies velocities, truncation period

ntruncn\_\{\\text\{trunc\}\}\. Let

σ​\(x\)=1/\(1\+e−x\)\\sigma\(x\)=1/\(1\+e^\{\-x\}\); rotated

\(u,v\)\(u,v\)coordinates with

u∈\[0,1\]u\\in\[0,1\]landward

→\\tobasinward\.

2:for

k=1,…,Nck=1,\\ldots,N\_\{c\}do

3:Update rollover

ur,k=ur,0\+Rku\_\{r,k\}=u\_\{r,0\}\+R\_\{k\}with cumulative advance

Rk=Rk−1\+Δ​rkR\_\{k\}=R\_\{k\-1\}\+\\Delta r\_\{k\}\(per\-tract sign\)\.

4:Lobe taper

Λ​\(y,x\)\\Lambda\(y,x\): super\-Gaussian in across\-strike coordinate

vv\.

5:Envelope \(two\-stage composition\): main foreset

zmain​\(u\)=zktop\+\(zkfore−zktop\)​σ​\(sfore​\(ueff−ur,k′\)\)z\_\{\\text\{main\}\}\(u\)=z\_\{k\}^\{\\text\{top\}\}\+\(z\_\{k\}^\{\\text\{fore\}\}\-z\_\{k\}^\{\\text\{top\}\}\)\\,\\sigma\(s\_\{\\text\{fore\}\}\(u\_\{\\text\{eff\}\}\-u^\{\\prime\}\_\{r,k\}\)\); toe stage

zenv​\(u\)=zmain\+\(ztoe∗−zmain\)​σ​\(stoe​\(u−ut,0\)\)​βtoez\_\{\\text\{env\}\}\(u\)=z\_\{\\text\{main\}\}\+\(z\_\{\\text\{toe\}\}^\{\*\}\-z\_\{\\text\{main\}\}\)\\,\\sigma\(s\_\{\\text\{toe\}\}\(u\-u\_\{t,0\}\)\)\\,\\beta\_\{\\text\{toe\}\}\.

6:for

b=1,…,Bb=1,\\ldots,Bdo

7:Bed surface

zk,b=zk,base\+\(zenv−zk,base\)​\(b/B\)​Π​\(u\)​Λ​\(y,x\)z\_\{k,b\}=z\_\{k,\\text\{base\}\}\+\(z\_\{\\text\{env\}\}\-z\_\{k,\\text\{base\}\}\)\(b/B\)\\,\\Pi\(u\)\\,\\Lambda\(y,x\), where

zk,basez\_\{k,\\text\{base\}\}is the base of clinothem

kk\(the top of clinothem

k−1k\{\-\}1\) and

Π​\(u\)\\Pi\(u\)is a product of

σ\\sigma\-gated onlap, extent, and shift tapers\.

8:Bed velocity

vb=vbase\+vrange​\(f−0\.5\)⋅2\+alam​\(−1\)k​B\+b\+γ​zk,bv\_\{b\}=v\_\{\\text\{base\}\}\+v\_\{\\text\{range\}\}\(f\-0\.5\)\\cdot 2\+a\_\{\\text\{lam\}\}\(\-1\)^\{kB\+b\}\+\\gamma z\_\{k,b\}, where

ffis the

σ\\sigma\-blended sand fraction across topset/foreset/toe zones\.

9:endfor

10:if

kmodntrunc=0k\\bmod n\_\{\\text\{trunc\}\}=0then

11:Erode the top

min⁡\(B,k\)\\min\(B,k\)bed surfaces toward the stack surface \(toplap sequence boundary\)\.

12:endif

13:endfor

14:Rasterize via soft\-Heaviside anti\-aliasing over all

Nc​BN\_\{c\}Bbed surfaces\.

### B\.8StructuralSmoother

TheStructuralSmoothermodule \([Algorithm˜8](https://arxiv.org/html/2605.30541#alg8)\) implements structure\-oriented smoothing\[Hale,[2009](https://arxiv.org/html/2605.30541#bib.bib54)\]\. The goal is to smooth the velocity volume anisotropically: strongly along the local bed direction, weakly across it\. The operator estimates local orientation from a structure tensor built from Sobel\-filtered gradients ofVV, then computes an eigendecomposition of the smoothed structure tensor to construct a per\-voxel diffusion tensor\. The smoothed output is the solution of a symmetric positive\-definite linear system solved by conjugate gradient\.

Algorithm 8StructuralSmoother\.apply\(Structure\-oriented smoothing\)1:Tensor\-smoothing scale

σtensor\\sigma\_\{\\text\{tensor\}\}, multi\-scale window

WW, smoothing scale

cc, cross\-layer diffusivity

α∈\(0,1\]\\alpha\\in\(0,1\], conjugate gradient iteration budget

ncgn\_\{\\text\{cg\}\}\.

2:Structure tensor

T​\(x\)=∑s=1W1s​gs​\(x\)​gs​\(x\)⊤T\(x\)=\\sum\_\{s=1\}^\{W\}\\tfrac\{1\}\{s\}\\,g\_\{s\}\(x\)\\,g\_\{s\}\(x\)^\{\\top\}with

gs=∇Vg\_\{s\}=\\nabla Vat scale

ss; Gaussian\-smooth each component at scale

σtensor\\sigma\_\{\\text\{tensor\}\}\.

3:Eigendecompose

TTpointwise into

λu≥λv≥λw\\lambda\_\{u\}\\geq\\lambda\_\{v\}\\geq\\lambda\_\{w\}with eigenvectors

u,v,wu,v,w; build diffusion tensor

D​\(x\)=α​u​u⊤\+v​v⊤\+w​w⊤D\(x\)=\\alpha\\,uu^\{\\top\}\+vv^\{\\top\}\+ww^\{\\top\}\.

4:Solve

\(I\+c​G⊤​D​G\)​y=V\(I\+c\\,G^\{\\top\}DG\)\\,y=Vby

ncgn\_\{\\text\{cg\}\}conjugate gradient iterations, where

GGis the discrete 3D gradient operator\.

## Appendix CVelocity Model Builder: Model Generation for Field\-Scale Dataset

This appendix describes the model building process for the six geological settings in our field\-scale dataset on Hugging Face\. For each setting we give \(i\) a description of the model, \(ii\) the real\-world geological settings that inspired it \(with supporting references\), \(iii\) the module builder sequence with module configurations, and \(iv\) what geological features we deliberately leave out\. The modules are in[Appendix˜B](https://arxiv.org/html/2605.30541#A2); CPU vs\. GPU timings for the model builder are in[Appendix˜D](https://arxiv.org/html/2605.30541#A4)\.This section leans heavily on geological terminology\.

Every velocity model has shape619×1000×1000619\\times 1000\\times 1000at 10 m isotropic spacing\.[Table˜6](https://arxiv.org/html/2605.30541#A3.T6)gives the full inventory and[Fig\.˜8](https://arxiv.org/html/2605.30541#A3.F8)provides build scripts and visualizations of models and migrated cubes for the Penobscot and F3 settings\.

Table 6:Velocity model inventory\. The velocity models all have identical shape and are at 10 m resolution\. Penobscot is held out as the out\-of\-distribution test set during training; the remaining 41 models populate training and in\-distribution test splits\.![Refer to caption](https://arxiv.org/html/2605.30541v1/x10.png)Figure 8:Python build scripts and \(velocity model, migrated cube\) pairs for Penobscot \(left\) and F3 \(right\)\. Each build script is a sequence ofModule\(Config\(…\)\)\.apply\(model\)calls that create geological features in the velocity model\. Each call corresponds to a feature visible in the migrated cube, e\.g\., the carbonate platform in Penobscot and the clinothem stack and salt diapir in F3\.### C\.1Penobscot

Our Penobscot model is a thick carbonate platform at depth, overlain by a siliciclastic column with a main listric fault and a small salt diapir at one edge\. The setting is the LaHave Platform on the Scotian Shelf \(offshore Nova Scotia\): a Mesozoic passive margin that accumulated kilometers of post\-rift sediment over a Late Jurassic carbonate bank\[Wade and MacLean,[1990](https://arxiv.org/html/2605.30541#bib.bib17), Brown,[2008](https://arxiv.org/html/2605.30541#bib.bib14)\]\. The Abenaki Formation is the geologic anchor\[Eliuk,[1978](https://arxiv.org/html/2605.30541#bib.bib15), Weissenbergeret al\.,[2006](https://arxiv.org/html/2605.30541#bib.bib18)\]\. The Penobscot 3D survey itself is contributed by CNSOPB and dGB Earth Sciences\[Baroniet al\.,[2019](https://arxiv.org/html/2605.30541#bib.bib13)\]\. The field has no thick allochthonous salt the way the Gulf of Mexico does; its salt contribution is subordinate and mostly confined to the adjacent Sable Subbasin\[Kidstonet al\.,[2002](https://arxiv.org/html/2605.30541#bib.bib16), Brown,[2008](https://arxiv.org/html/2605.30541#bib.bib14)\]\. We create a minor diapir in the synthetic to add some halokinetic texture in an otherwise salt\-poor geology\.

The builder runs a deterministic 21 step sequence on a1000×10001000\\times 1000grid at 10 m isotropic spacing\. It lays a 5500 m/s basement and a 30 cell pre\-carbonate at 4000 m/s with sinusoidal interbeds\. The carbonate is added via twoCarbonatePlatformpasses: a 120 cell main platform with three lateral platform centers \(core 5500, forereef 5000, lagoon 4200, carbonate basin 3500 m/s\) and a 40 cell patch\-reef pass with four smaller reef bodies\. A 36 cell basin\-fill deposit at 3300 m/s drapes the carbonate flanks\. SevenDepositevents then reproduce the clastic overburden in stratigraphic order from Lower Mississauga \(3800 m/s\) up through Banquereau Upper \(1900 m/s\), with a thin Wyandot chalk unit \(3400 m/s, suppressed interbed taper\) embedded in the middle\. A final 20 cell shallow Pleistocene deposit at 1700 m/s drapes the faulted surface before the water column\. ThreeSquishevents apply basin\-scale vertical warping at decreasing amplitude \(90, 65, 40 m\)\. One listric main fault \(azimuth50∘50^\{\\circ\}, dip30∘30^\{\\circ\}, slip fraction 0\.025\) and four V\-shaped secondaries \(paired at25∘25^\{\\circ\}and155∘155^\{\\circ\}around common centers\) give a total of five fault events\. A two\-body salt diapir is placed on the right edge \(placement boundsx∈\[0\.84,0\.98\]x\\in\[0\.84,0\.98\]\), vertically elongated \(correlation anisotropy\[1\.5,0\.6,0\.6\]\[1\.5,0\.6,0\.6\]\), and flanked by theSaltWedgemodule\. A 13 cell water column at 1480 m/s caps the model\.

What is real: the carbonate\-clastic contrast, the five faults, the small\-edge diapir, and the broad basin\-warping sequence\. What we skip: the specific Abenaki reef\-margin mode \(rimmed\-shelf versus barrier\-bank\) and the Penobscot Canyon erosional feature\.

### C\.2F3

Our F3 model is a deltaic\-shelf model with deep salt and large\-scale faulting\. The reference is the F3 Block in the Dutch sector of the Central Graben, an open 3D seismic volume widely used in seismic interpretation\[[Silvaet al\.,](https://arxiv.org/html/2605.30541#bib.bib22)\]\. Two features drive the modeling choices\. First, Permian\-age salt \(the Zechstein\) sits at depth and acts as a weak layer: sediment loading above it generates large\-scale faults that sole into the salt rather than into the underlying brittle basement\[Davisonet al\.,[2000](https://arxiv.org/html/2605.30541#bib.bib19)\]\. Second, the Plio\-Pleistocene section records the Eridanos river system, a continental\-scale delta that drained northwestern Europe into the southern North Sea and left behind repeated basinward\- stepping clinothem packages \(a clinothem is one dipping shelf\-slope\-basin sediment body\)\[Overeemet al\.,[2001](https://arxiv.org/html/2605.30541#bib.bib21), Kuhlmannet al\.,[2006](https://arxiv.org/html/2605.30541#bib.bib20)\]\.

Build\-time grid is1000×1000×12001000\\times 1000\\times 1200at 20 m horizontal and 2\.5 m vertical spacing; the shipped volumes resample to a619×1000×1000619\\times 1000\\times 1000grid at 10 m resolution\. The stratigraphy is built deep to shallow: a 3450 m/s basement, a pre\-chalk Scruff analogue \(2600 m/s\), a Chalk unit with the lowest interbed contrast in the model \(2900 m/s, 40 m/s std\), a Lower Tertiary package \(2200 m/s\), and a chaotic mass\-transport layer \(2300 m/s, 250 m/s std: the*highest*interbed contrast in the model\)\. A rifting\-phase squish \(75 m max shift at90∘90^\{\\circ\}azimuth\) warps the pre\-clinoform section\. The delta itself is built by theDeltaClinoformDepositmodule: ten forestepping clinothems \(range 8–12 per realization\), twelve beds each on average, stacked along a133∘133^\{\\circ\}SE trajectory with a rollover at 18 % along the profile, a flat topset fraction of 0\.5, and a toe\-flattening strength of 0\.75\. A second broad squish \(35 m max shift, near\-zero azimuth\) warps the pre\-salt section before upper deposits are applied\. Four shallower deposits \(Purple 1925, Upper Deep 1850, Upper Mid 1800, Upper Shallow 1750 m/s\) overlie the delta\. One main long\-wavelength fault \(azimuth145∘145^\{\\circ\}, dip30∘30^\{\\circ\}\), a left\-side fault cluster of∼\\sim14 faults, and 2–4 geologically\-placed support faults \(offset from the main fault by\+15∘\+15^\{\\circ\}and\+165∘\+165^\{\\circ\}, 60/40 split\) reproduce the F3 fault population\. A Yellow\-Pleistocene cap at 1650 m/s drapes the faulted surface, and a 30 cell water column at 1500 m/s tops the model\. Five salt\-body groups \(three on a diagonal chain, two outskirt\) are placed belowz=0\.40⋅nzz=0\.40\\cdot n\_\{z\}at 3500 m/s, then draped by a surface\-conformingSaltWedgepass\.

We refer to the shelf\-slope\-basin geometry as*oblique\-tangential*using the seismic\-stratigraphy vocabulary ofMitchumet al\.\[[1977](https://arxiv.org/html/2605.30541#bib.bib48)\], and describe the stacking pattern as*forestepping*or via the shelf\-edge\-trajectory language ofHelland\-Hansen and Hampson \[[2009](https://arxiv.org/html/2605.30541#bib.bib47)\]\. The word “progradation” describes a temporal depositional process; our model is a static velocity volume, so the geometric terms fit better\.

What is real: the clinothem geometry, the Zechstein\-decoupled fault style, and the deep\-to\-shallow velocity trend\. What we skip: intra\-clinothem mass\-transport deposits, salt\-pillar welding into the Jurassic section, and glaciomarine fabrics in the uppermost Pleistocene\.

### C\.3Fault

The Fault setting is a stress test, not a field\. It gives the training set dense normal\-fault populations with high velocity contrast across sub\-parallel fault arrays\. The closest natural analogue is the post\-rift section of the North Sea \(Viking and Central Graben\), where two superposed Mesozoic rifting phases left a multi\-orientation normal\-fault population with widespread block rotation\[Badleyet al\.,[1988](https://arxiv.org/html/2605.30541#bib.bib39), Underhill and Partington,[1993](https://arxiv.org/html/2605.30541#bib.bib45)\]\. We reference the fault\-population literature for the mechanism by which primary faults seed sub\-seismic secondaries: displacement–length scaling\[Cowie and Scholz,[1992](https://arxiv.org/html/2605.30541#bib.bib40)\], 3D fault\-surface geometry\[Nicolet al\.,[1996](https://arxiv.org/html/2605.30541#bib.bib43)\], relay\-ramp breaching between overstepping segments\[Peacock and Sanderson,[1994](https://arxiv.org/html/2605.30541#bib.bib44)\], and along\-strike displacement patterns from the East African Rift\[Morley,[1999](https://arxiv.org/html/2605.30541#bib.bib42)\]\. The integrating tectono\-sedimentary framework is described inGawthorpe and Leeder \[[2000](https://arxiv.org/html/2605.30541#bib.bib41)\]\.

The builder stacks four contrast\-producing deposits on a 5000 m/s basement \([Fig\.˜9](https://arxiv.org/html/2605.30541#A3.F9)\): deep 4000 m/s, mid 3200 m/s, upper 2500 m/s, and a post\-fault drape at 1950 m/s\. Two squish events interrupt the stack \(200 m and 130 m max shift\)\. Each realization then draws*3, 4, 5, or 6*primary faults from an inclusive integer range; primary count is*not*fixed\. All primaries share one system azimuth drawn from\[−45∘,\+45∘\]\[\-45^\{\\circ\},\+45^\{\\circ\}\], are confined to non\-crossing lanes perpendicular to that azimuth, and carry the same slip direction so the array reads as a single sub\-parallel population\. Per primary, 20–30 secondary faults are generated using Riedel\-shear geometry: synthetic R\-shears offset by\+15∘\+15^\{\\circ\}from the primary, antithetic R′\-shears by\+165∘\+165^\{\\circ\}, with 85 % synthetic bias and a 0\.75 hanging\-wall placement bias\. A minimum\-distance filter rejects overlapping subsidiaries, so the post\-filter secondary count varies with realization\. After faulting, a drape deposit heals the discontinuity\-bearing surface, a 15 cell water column at 1480 m/s goes on top, and the final clamp to\[1500,5000\]\[1500,5000\]m/s removes any outliers introduced by fault\-zone interpolation\.

What is real: the primary\-array style, the Riedel\-geometry secondary pattern, and the high\-contrast deposit stack\. What we skip: overpressure\-driven polygonal faulting, fault\-zone damage halos, and listric detachment into a ductile basal layer\. These realizations push the wavefield operator against dense discontinuities, not against the mechanical specifics of any named rift system\.

![Refer to caption](https://arxiv.org/html/2605.30541v1/x11.png)Figure 9:Per\-stage build of one realization of the Fault setting\. Reading left\-to\-right, top\-to\-bottom: the basement \(5000 m/s\) is laid down, followed by the four contrast\-producing deposits \(4000, 3200, 2500 m/s and the post\-fault drape at 1950 m/s\) interleaved with two squish events that warp the lower stack\. The drape deposit then heals the discontinuity\-bearing surface and the water column caps the model\.
### C\.4Gulf of Mexico

Our GoM analogue is a deepwater salt\-tectonics model: multiple salt bodies embedded in a layered sediment column, with sediments deformed upward against salt flanks\. The setting is the northern Gulf of Mexico passive margin, where salt from a deep Jurassic source layer has risen through the overlying sediment as walls, stocks, sheets, and coalesced canopies\[Diegelet al\.,[1995](https://arxiv.org/html/2605.30541#bib.bib24), Peelet al\.,[1995](https://arxiv.org/html/2605.30541#bib.bib28), Hudecet al\.,[2013](https://arxiv.org/html/2605.30541#bib.bib26)\]\.*Allochthonous*means that the salt has been transported out of its original stratigraphic position, leaving behind sediment\-filled*minibasins*between the new salt bodies\. These minibasins form because sediment loading differentially depresses the surrounding salt\[Hudec and Jackson,[2007](https://arxiv.org/html/2605.30541#bib.bib33)\]\. The rise of a salt diapir also deforms the flanking sediments, producing thinned and folded packages whose geometry is formalized as*halokinetic sequences*\(narrow hook sequences when the diapir rises fast relative to sedimentation, broad wedge sequences when it rises slowly\)\[Giles and Rowan,[2012](https://arxiv.org/html/2605.30541#bib.bib32)\]\. The geometric consequence for imaging is a large sediment\-to\-salt velocity jump that produces shadow zones and multipathing in ray\-based tomography\[Sava and Biondi,[2004a](https://arxiv.org/html/2605.30541#bib.bib30),[b](https://arxiv.org/html/2605.30541#bib.bib31), Jones and Davison,[2014](https://arxiv.org/html/2605.30541#bib.bib27), Jackson and Hudec,[2017](https://arxiv.org/html/2605.30541#bib.bib34)\]\.

The builder runs on a1000×1000×8001000\\times 1000\\times 800grid at 10 m resolution\. A 5 cell 4500 m/s basement is overlain by nineDepositevents \([Fig\.˜10](https://arxiv.org/html/2605.30541#A3.F10)\) with base velocities decreasing upward from 2900 m/s \(deep Lower Tertiary\) to 1800 m/s \(Pleistocene\)\. Five squish events interleave with the deposits; squish amplitude decreases upsection from 500 m to 100 m, giving the kind of depth\-dependent folding one expects when older sediment has seen more deformation\. TheSaltSDTmodule inserts 2–7 salt bodies per realization \(base 4\) at 4500 m/s, placed at depth fractions\[0\.15,0\.60\]\[0\.15,0\.60\], with a correlation length of 0\.7 in each axis on the GRF perturbing the body boundary\. TheSaltWedgemodule then deforms the surrounding sediment with a combination of a Gaussian weight in distance to the nearest vertical flank, a radial drag weight raised to the 1\.8 power, a depth\-dependent taper, and a fractal spatial\-amplitude field\. The net effect curves sediment upward against salt flanks and drapes it onto salt roofs, which is the geometric shape that the tapered \(wedge\) class of halokinetic sequence produces\[Giles and Rowan,[2012](https://arxiv.org/html/2605.30541#bib.bib32)\]\. A 96 cell water column at 1500 m/s caps the column\.

What is real: the∼\\sim2700 m/s salt\-to\-sediment velocity jump, the variety of individual salt\-body shapes that a parametric perturbation can span, and the upward\-flanking sediment geometry\. What we skip: welded or rafted salt architectures, dissolution textures, and multi\-stage salt\-tectonic histories\.

![Refer to caption](https://arxiv.org/html/2605.30541v1/x12.png)Figure 10:Per\-stage build of one realization of the Gulf of Mexico setting\. Reading left\-to\-right, top\-to\-bottom: the 5 cell basement \(4500 m/s\) is overlain by nine sediment deposits whose base velocities decrease upward \(Lower Tertiary 2900 m/s through Pleistocene 1800 m/s\), with five squish events of decreasing amplitude \(500, 400, 300, 200, 100 m\) interleaved between specific deposits to produce depth\-dependent folding\. The salt stage inserts SDT\-generated bodies at depth fractions\[0\.15,0\.60\]\[0\.15,0\.60\], and the wedging stage deforms the surrounding sediment to produce the upward onlap and drape characteristic of halokinetic sequences\.
### C\.5Salt Canopy

The Salt Canopy model represents a second GoM setting\. Multiple allochthonous salt sheets have coalesced laterally into a shallow, approximately tabular body above interbedded sediments\. Unlike the diapir\-dominated GoM setting above, the canopy is continuous in map view and produces the canonical subsalt imaging problem\[Hudec and Jackson,[2006](https://arxiv.org/html/2605.30541#bib.bib25), Pilcheret al\.,[2011](https://arxiv.org/html/2605.30541#bib.bib29), Jones and Davison,[2014](https://arxiv.org/html/2605.30541#bib.bib27), Sava and Biondi,[2004b](https://arxiv.org/html/2605.30541#bib.bib31)\]\. The BP\-operated Tiber discovery in Keathley Canyon block 102 is a widely cited field example of canopy salt\.

The four canopy models in the dataset predate theSaltSDTmodule and were built with a legacy code path \(pysaltmodel\) on a1584×2096×5361584\\times 2096\\times 536generation grid at 10 m resolution, then resampled and depth\-truncated to the common619×1000×1000619\\times 1000\\times 1000grid\. Each salt body is the volume between two 2D fractal\-noise elevation surfaces \(a “top” and a “bot”\) filled at salt velocity, with a frequency\-octave ladder of\[2,4,8,16,32\]\[2,4,8,16,32\]and an amplitude ladder of\[1,0\.3,0\.1,0\.05,0\.025\]\[1,0\.3,0\.1,0\.05,0\.025\]\. Sediment is filled both below and above the salt regime\. A correction pipeline then inpaints low\-velocity voxels, caps velocities at 5000 m/s, cuts below a detected basement, adds a130±10130\\pm 10\-cell water column at 1500 m/s, and clamps the final volume to\[1500,5000\]\[1500,5000\]m/s\. This model building process is inspired byFarriset al\.\[[2023](https://arxiv.org/html/2605.30541#bib.bib37)\], Farris \[[2023](https://arxiv.org/html/2605.30541#bib.bib38)\]\.

What is real: the tabular canopy morphology and the subsalt sediment–salt–sediment contrast\. What we skip: inter\-canopy saline\-entrainment textures and sub\-canopy overpressure signatures mapped in the real\-world Keathley Canyon section\.

### C\.6SEAM Phase I

The twelve SEAM models are*reused, not generated*\. They come from the SEG Advanced Modeling \(SEAM\) Phase I 3D earth model\[Fehler and Keliher,[2011](https://arxiv.org/html/2605.30541#bib.bib2)\], which was built by an industry consortium between 2007 and 2013 to model a salt canopy region of the deepwater Gulf of Mexico\. We incorporate SEAM because it is a community\-validated reference model in the same regime as our Salt Canopy setting\. The Phase I model is distributed under CC BY 4\.0 through the SEG SEAM Open Data program\[Society of Exploration Geophysicists,[2024](https://arxiv.org/html/2605.30541#bib.bib6)\]\.

The raw Phase I model is1501×4001×35011501\\times 4001\\times 3501at 10 m resolution \(15​km×40​km×35​km15\\text\{~km\}\\times 40\\text\{~km\}\\times 35\\text\{~km\}\)\. Our pipeline extracts twelve1501×1000×10001501\\times 1000\\times 1000subregions on a4×34\\times 3grid\. We apply structure\-oriented smoothing to each subregion \(σtensor=8\\sigma\_\{\\text\{tensor\}\}=8cells, smoothing\-scale=8=8cells,α=0\.3\\alpha=0\.3, 10 conjugate gradient iterations\) and clamp to\[1500,5000\]\[1500,5000\]m/s\. Each subregion is depth\-truncated to 619 cells, which is followed by structure\-oriented smoothing withσtensor=8\\sigma\_\{\\text\{tensor\}\}=8, smoothing\-scale=8=8,α=0\.3\\alpha=0\.3, and 10 conjugate gradient iterations\.

What is real: SEAM is based on real\-world geology\. What we skip: N/A\.

## Appendix DVelocity Model Builder: GPU vs\. CPU Timing

This appendix backs[Table˜2](https://arxiv.org/html/2605.30541#S3.T2)with hardware, methodology, per\-stage timings, and per\-module timing totals\. The modules are described in[Appendix˜B](https://arxiv.org/html/2605.30541#A2); the per\-setting build sequences are in[Appendix˜C](https://arxiv.org/html/2605.30541#A3)\.

##### Hardware\.

A single compute node with one 80 GB A100 NVIDIA GPU and an AMD EPYC 7763 host \(16\-core / 256 GB allocation\)\. The GPU was held in CUDA Exclusive\_Process mode so no other process contended for the device\. CPU runs setCUDA\_VISIBLE\_DEVICES=""to avoid creating a CUDA context, with BLAS / OpenMP thread counts pinned to 16\.

##### Methodology\.

Each \(setting, device\) pair runs as a new Python subprocess\. Per\-stage timings usetime\.perf\_counter\(\)bracketed bytorch\.cuda\.synchronize\(\)on the GPU run\. GPU utilization and memory usage are sampled 10 times per second, i\.e\., at 10 Hz\.

##### Per\-setting totals\.

[Table˜7](https://arxiv.org/html/2605.30541#A4.T7)stacks the wall\-time breakdown alongside peak GPU memory and peak CPU memory\. Wall is the full subprocess time including Taichi initialization, CUDA\-context setup, and inter\-stage cleanup; the gap between Wall and Build is the warm\-up plus Python overhead\.

Table 7:Per\-setting totals\. Build is the sum of per\-stage build times\. Wall is the full subprocess wall time\. Peak GPU \(CPU\)memory is the maximum GPU \(CPU\) memory used during the run\.
##### Per\-stage breakdowns\.

[Tables˜8](https://arxiv.org/html/2605.30541#A4.T8),[9](https://arxiv.org/html/2605.30541#A4.T9),[10](https://arxiv.org/html/2605.30541#A4.T10)and[11](https://arxiv.org/html/2605.30541#A4.T11)list every module call\.canonical\_sosis the finalStructuralSmoothercall; the fourwaterstages set the water column velocity and run in milliseconds on both CPU and GPU\.

Table 8:Per\-stage build for the Penobscot setting\. Speedup is the ratio of CPU to GPU build time\. Avg / pk util are weighted\-average and peak GPU utilization sampled at 10 Hz across the stage\. Pk CPU memory is the maximum sampled CPU memory usage\.Table 9:Per\-stage build for the F3 setting\. Speedup is the ratio of CPU to GPU build time\. Avg / pk util are weighted\-average and peak GPU utilization sampled at 10 Hz across the stage\. Pk CPU memory is the maximum sampled CPU memory usage\.Table 10:Per\-stage build for the Fault setting\. Speedup is the ratio of CPU to GPU build time\. Avg / pk util are weighted\-average and peak GPU utilization sampled at 10 Hz across the stage\. Pk CPU memory is the maximum sampled CPU memory usage\.Table 11:Per\-stage build for the Gulf of Mexico setting\. Speedup is the ratio of CPU to GPU build time\. Avg / pk util are weighted\-average and peak GPU utilization sampled at 10 Hz across the stage\. Pk CPU memory is the maximum sampled CPU memory usage\.
##### Per\-module totals and observations\.

[Table˜12](https://arxiv.org/html/2605.30541#A4.T12)groups the per\-stage rows by builder module\. The largest speedups are onFault\(38\.9×\\times\) andStructuralSmoother\(34\.5×\\times\)\.Deposithas a smaller per\-call speedup \(13\.6×\\times\) but its high call count makes it the biggest contributor to total time saved\.SaltSDTandSaltWedgeshow no GPU advantage; improving these implementations is a target for future work\.

Table 12:Per\-module totals over all four settings\.nis the number of times each stage is invoked;GPU sandCPU sare the summed times for each stage\.Avg utilis the average GPU utilization \(sampled at 10 Hz\)\.Peak GPU memoryis the maximum GPU memory used by the module\.

## Appendix ESeismic Forward Modeling Details

This appendix gives slice extraction, numerical scheme, source wavelet, and acquisition geometry details for the seismic data generation in[Section˜4](https://arxiv.org/html/2605.30541#S4)\.

### E\.1Slice extraction and dataset organization

The 2D dataset is produced from the 42 3D volumes \([Table˜6](https://arxiv.org/html/2605.30541#A3.T6)\) by a deterministic extractor with seed 42\. All volumes are first depth\-truncated to 619 cells \(the minimum depth across the 10 GoM volumes\), then re\-smoothed via theStructuralSmoothermodule at the truncated resolution so that the smoothing scale is consistent across settings\. Slice planes are placed with a 10 % boundary margin on each horizontal axis \(to avoid boundary\-layer artifacts\) and an orientation split of 50 % inline vs 50 % crossline per setting, sampled without replacement at integer slice indices\. Inline and crossline indices are drawn from independent random streams \(per\-model seeds 42 \+ model\_index and 42 \+ model\_index \+ 10000 respectively\), and the SEAM volumes are extracted as 12 sub\-regions of1000×10001000\\times 1000cells with 79 slices per sub\-region\. Before slicing, each of the 42 3D velocity volumes is normalized in place to fitvp∈\[1500,5000\]v\_\{p\}\\in\[1500,5000\]m/s: a linear rescale from\[vmin,vmax\]\[v\_\{\\min\},v\_\{\\max\}\]to\[vmin,5000\]\[v\_\{\\min\},5000\]whenvmax\>5000v\_\{\\max\}\>5000\(preserving every relative velocity contrast within the volume\), followed by a hard clip floor at15001500\. All extracted 2D slices inherit this normalization, so the 4,276 shipped slices satisfyvp∈\[1500,5000\]v\_\{p\}\\in\[1500,5000\]m/s without further per\-slice modification, consistent with the dispersion \([Section˜E\.3\.1](https://arxiv.org/html/2605.30541#A5.SS3.SSS1)\) and CFL \([Section˜E\.3\.2](https://arxiv.org/html/2605.30541#A5.SS3.SSS2)\) constraints derived below\.

##### Splits\.

Per\-setting train slice counts: F310×97=97010\\times 97=970, GoM10×73=73010\\times 73=730, faulted\-complex5×145=7255\\times 145=725, salt\-canopy4×181=7244\\times 181=724, SEAM12×79=94712\\times 79=947\. The training split is the union of these slices,970\+730\+725\+724\+947=4,096970\+730\+725\+724\+947=4\{,\}096\. Twenty additional slices per training setting form a 100\-slice in\-distribution test split drawn from non\-overlapping indices within the same volumes\. The 80 Penobscot slices form the out\-of\-distribution test split, giving a dataset\-wide total of4,096\+100\+80=4,2764\{,\}096\+100\+80=4\{,\}2762D slices\.

##### Parquet index\.

The benchmark root indexes 47,078 rows total: 42 3D models, 4,276 slices, 21,380 wavefields \(4,276×54\{,\}276\\times 5frequency bands\), and 21,380 shot\-gather cubes\. Each row carries 25 columns, includingslice\_id,model\_id,data\_type\(model,slice,wavefield, orgather\),model\_type,split,file\_path,orientation,frequency\_band, source coordinates, and precomputed per\-slice velocity statistics\. Slices and their paired wavefields and shot\-gather cubes share the sameslice\_id, so joining acrossdata\_typeonslice\_idmaterializes the paired triplets at any given frequency band\.

### E\.2Continuous PDE and boundary conditions

The forward simulator solves the 2D acoustic, constant\-density, isotropic wave equation in the\(x,z\)\(x,z\)plane\. Withp​\(x,z,t\)p\(x,z,t\)the pressure wavefield,vp​\(x,z\)v\_\{p\}\(x,z\)the velocity model, ands​\(x,z,t\)s\(x,z,t\)the source term \(a band\-limited Ricker pulse,[Section˜E\.4](https://arxiv.org/html/2605.30541#A5.SS4)\),

1vp​\(x,z\)2​∂2p∂t2−\(∂2p∂x2\+∂2p∂z2\)=s​\(x,z,t\)\.\\frac\{1\}\{v\_\{p\}\(x,z\)^\{2\}\}\\frac\{\\partial^\{2\}p\}\{\\partial t^\{2\}\}\-\\left\(\\frac\{\\partial^\{2\}p\}\{\\partial x^\{2\}\}\+\\frac\{\\partial^\{2\}p\}\{\\partial z^\{2\}\}\\right\)=s\(x,z,t\)\.\(2\)The factor1/vp21/v\_\{p\}^\{2\}scales the temporal acceleration term: lower\-velocity regions have a larger coefficient on∂t2p\\partial\_\{t\}^\{2\}p, so they admit slower propagation\. The Laplacian∂x2\+∂z2\\partial\_\{x\}^\{2\}\+\\partial\_\{z\}^\{2\}measures spatial curvature of the wavefield, which is the restoring force that drives propagation\.

##### Initial conditions\.

The pressure and its time derivative are zero everywhere att=0t=0,p​\(x,z,0\)=0p\(x,z,0\)=0and∂tp​\(x,z,0\)=0\\partial\_\{t\}p\(x,z,0\)=0\. The wavefield is excited entirely by the source termss, which fires a localized Ricker pulse near the surface \([Section˜E\.5](https://arxiv.org/html/2605.30541#A5.SS5)\)\.

##### Free surface at the top\.

The top edge of the simulation domain \(z=0z=0\) is a free surface, modeling the air–water \(or air–rock\) interface where pressure must vanish:

p​\(x,0,t\)=0,t∈\[0,T\]\.p\(x,0,t\)=0,\\qquad t\\in\[0,T\]\.\(3\)This is enforced in Devito by thefs=Truesetting on theModelobject\[Louboutinet al\.,[2019](https://arxiv.org/html/2605.30541#bib.bib56)\]and produces the sign\-flipped reflection of upgoing energy back into the model that is characteristic of marine acquisitions\.

##### Cerjan\-style sponge damping on the other three sides\.

The left, right, and bottom edges are absorbing rather than physical: the simulation grid is just a truncation of an unbounded earth model, and we want waves arriving at those edges to dissipate without spurious reflection\. Devito’sbcs="damp"configuration adds a 60\-cell sponge layer along each absorbing side, following the formulation ofCerjanet al\.\[[1985](https://arxiv.org/html/2605.30541#bib.bib52)\]: inside the sponge, the wave equation acquires a first\-order temporal\-damping term proportional to a position\-dependent damping coefficientη​\(𝐱\)\\eta\(\\mathbf\{x\}\),

1vp​\(𝐱\)2​∂2p∂t2−∇2p\+η​\(𝐱\)​∂p∂t=s​\(𝐱,t\)\.\\frac\{1\}\{v\_\{p\}\(\\mathbf\{x\}\)^\{2\}\}\\frac\{\\partial^\{2\}p\}\{\\partial t^\{2\}\}\-\\nabla^\{2\}p\+\\eta\(\\mathbf\{x\}\)\\,\\frac\{\\partial p\}\{\\partial t\}=s\(\\mathbf\{x\},t\)\.\(4\)The coefficient is zero throughout the interior, recovering \([2](https://arxiv.org/html/2605.30541#A5.E2)\) there, and rises smoothly to its peak at the outer edge\. Withnbln\_\{\\text\{bl\}\}damping cells,hhthe grid spacing, andρ∈\[0,1\]\\rho\\in\[0,1\]the normalized distance into the sponge,

η​\(𝐱\)=cdamph​\[ρ−sin⁡\(2​π​ρ\)2​π\],cdamp=1\.5​ln⁡\(1/R\)nbl,\\eta\(\\mathbf\{x\}\)=\\frac\{c\_\{\\text\{damp\}\}\}\{h\}\\left\[\\rho\-\\frac\{\\sin\(2\\pi\\rho\)\}\{2\\pi\}\\right\],\\qquad c\_\{\\text\{damp\}\}=\\frac\{1\.5\\,\\ln\(1/R\)\}\{n\_\{\\text\{bl\}\}\},\(5\)whereR=10−3R=10^\{\-3\}is the target reflection coefficient at the outer edge \(−60\-60dB\)\. Thesin⁡\(2​π​ρ\)/\(2​π\)\\sin\(2\\pi\\rho\)/\(2\\pi\)term tapers the rate of growth soη\\etarises gradually rather than as a hard step, which is what keeps the damping itself from generating new spurious reflections\. Withnbl=60n\_\{\\text\{bl\}\}=60andh=10h=10m, the sponge thickness is0\.60\.6km — wide enough at our highest band \(fmax=25f\_\{\\max\}=25Hz\) for waves to traverse the layer over multiple wavelengths and so be attenuated to the design tolerance before reaching the outer boundary\.

### E\.3Solving the Wave Equation Numerically

#### E\.3\.1Spatial Sampling \(Dispersion Control\)

Finite\-difference schemes propagate short\-wavelength components too slowly relative to the true phase velocity, an artifact known as numerical dispersion\[Liu and Sen,[2009](https://arxiv.org/html/2605.30541#bib.bib55)\]\. To keep this error below a prescribed tolerance, the grid must resolve the shortest wavelengthλmin=vmin/fmax\\lambda\_\{\\min\}=v\_\{\\min\}/f\_\{\\max\}with at leastGGgrid points:

h≤vminG​fmax\.h\\leq\\frac\{v\_\{\\min\}\}\{Gf\_\{\\max\}\}\.
The requiredGGdepends on the stencil order: higher\-order schemes need fewer points per wavelength for the same accuracy\. For our 8th\-order spatial stencil,G=4G=4keeps the phase\-velocity dispersion error well below 1 % over the bandwidths we simulate, so the grid spacing is fixed by the highest\-frequency content propagating through the slowest part of the model\.

#### E\.3\.2CFL Stability Condition

Oncehhis chosen, the time step cannot be chosen independently\. The Courant–Friedrichs–Lewy \(CFL\) condition provides the maximumΔ​t\\Delta tfor which the explicit time\-marching scheme remains stable\[Courantet al\.,[1928](https://arxiv.org/html/2605.30541#bib.bib51)\]:

Δ​t≤σmax​hvmax,\\Delta t\\leq\\sigma\_\{\\max\}\\frac\{h\}\{v\_\{\\max\}\},whereσmax=\(d​∑m=1M\|am\|\)−1\\sigma\_\{\\max\}=\\left\(\\sqrt\{d\\sum\_\{m=1\}^\{M\}\|a\_\{m\}\|\}\\right\)^\{\-1\}is the maximum stable Courant number,ddis the number of spatial dimensions, and\{am\}m=1M\\\{a\_\{m\}\\\}\_\{m=1\}^\{M\}are the finite\-difference coefficients\[Fornberg,[1988](https://arxiv.org/html/2605.30541#bib.bib53)\]\. Note the asymmetry: the dispersion criterion involvesvminv\_\{\\min\}\(shortest wavelength\), while stability involvesvmaxv\_\{\\max\}\(fastest propagation\)\. Both constrain us simultaneously\.

#### E\.3\.3Temporal Subsampling \(Nyquist\)

The CFL bound forcesΔ​t=1\\Delta t=1ms for stability, but the band\-limited wavefield does not need to be saved at that rate\. Storing every computational step would inflate the dataset by an order of magnitude without adding any information that the band\-limit cannot already recover, and would slow training throughput in proportion\. Shannon–Nyquist sets the floor: a signal band\-limited atfmaxf\_\{\\max\}can be reconstructed exactly from samples taken every1/\(2​fmax\)1/\(2f\_\{\\max\}\)seconds\. We therefore save everykk\-th computed step, where

k=⌊N2​fmax​T⌋\.k=\\left\\lfloor\\frac\{N\}\{2f\_\{\\max\}T\}\\right\\rfloor\.This yieldsNsaved=⌈N/k⌉N\_\{\\mathrm\{saved\}\}=\\lceil N/k\\rceilsnapshots at an effective sampling interval ofΔ​tsaved=k⋅Δ​t\\Delta t\_\{\\mathrm\{saved\}\}=k\\cdot\\Delta t\. We usek=14k=14uniformly across all five frequency bands so the saved tensor has the same temporal dimension regardless of source bandwidth; this is conservative for the lower bands \([Table˜14](https://arxiv.org/html/2605.30541#A5.T14)\) but yields a uniform\(Nsaved,nz,nx\)\(N\_\{\\mathrm\{saved\}\},n\_\{z\},n\_\{x\}\)shape that batches naturally during training\.

#### E\.3\.4Configuration Summary

The central design constraint is that every simulation, regardless of which frequency band the source wavelet targets, must produce a wavefield tensor of identical shape\. Neural operators expect fixed\-size inputs during training, and batching requires uniform dimensions\. We therefore adopt a single spatial grid, a single time step, and a single temporal subsampling factor across all 21,380 wavefield simulations and 21,380 shot\-gather cube simulations \(5 frequency bands×\\times4,276 slices\);[Table˜13](https://arxiv.org/html/2605.30541#A5.T13)summarizes the resulting simulation parameters\.

Table 13:Simulation parameters shared across all 5 frequency bands\. Wavefields and shot\-gather cubes share the same grid, time step, stencil order, boundary, and subsampling factor; they differ only in recording time, source geometry, and whether the full wavefield or a receiver\-extracted gather is saved\.ParameterSymbolValueGrid spacinghh10 m \(isotropic\)Spatial FD order2​M2M8 \(M=4M=4\)Max P\-wave velocityvmaxv\_\{\\max\}5000 m/sMin P\-wave velocityvminv\_\{\\min\}1500 m/sComputational time stepΔ​t\\Delta t1\.0 msRecording time \(wavefield\)TT5\.0 sRecording time \(shot\-gather\)TT8\.0 sTotal computational stepsNN5001 \(wavefield\) / 8001 \(gather\)Subsampling factorkk14Saved snapshots \(wavefield\)NsavedN\_\{\\mathrm\{saved\}\}358Saved time samples \(gather\)NsavedN\_\{\\mathrm\{saved\}\}572Saved time intervalΔ​tsaved\\Delta t\_\{\\mathrm\{saved\}\}14\.0 msSlice depthnzn\_\{z\}619Slice widthnxn\_\{x\}1000 \(typical\)Source wavelet—Bandpass Ricker \([Section˜E\.4](https://arxiv.org/html/2605.30541#A5.SS4)\)Absorbing boundary—60\-cell sponge damping; free topWavefield tensor shape—\(𝟑𝟓𝟖,619,1000\)\\mathbf\{\(358,\\,619,\\,1000\)\}Shot\-gather cube shape—\(𝟔𝟒,572,1000\)\\mathbf\{\(64,\\,572,\\,1000\)\}HereMMis the spatial half\-stencil width: the number of grid points on each side of the central node used to approximate the second spatial derivative\. An 8th\-order scheme \(M=4M=4\) reaches four neighbors in each direction, a common choice that balances accuracy against computational cost\.

##### Absorbing boundary and slice margin\.

Each simulation grid is padded on its left, right, and bottom with a 60\-cell sponge damping zone \(\([4](https://arxiv.org/html/2605.30541#A5.E4)\), \([5](https://arxiv.org/html/2605.30541#A5.E5)\)\); the top edge is a free surface \([3](https://arxiv.org/html/2605.30541#A5.E3)\)\. The slice extraction step \([Section˜E\.1](https://arxiv.org/html/2605.30541#A5.SS1)\) reserves a 10 % lateral margin from each cube edge, and source positions reserve an additional 0\.5 km margin inside the absorbing\-boundary region during forward modeling, so sources radiate into the physical interior rather than into the damping layer\.

We generated the wavefields and shot\-gather cubes for each of the five frequency bands\. The bands differ only in the source wavelet, so the grid, time step, and saved tensor shape are identical across all bands\. The Nyquist margin in[Table˜14](https://arxiv.org/html/2605.30541#A5.T14)is the ratio of the highest frequency the saved sampling can faithfully represent \(1/\(2​Δ​tsaved\)≈35\.71/\(2\\Delta t\_\{\\mathrm\{saved\}\}\)\\approx 35\.7Hz atΔ​tsaved=14\\Delta t\_\{\\mathrm\{saved\}\}=14ms\) to the band’s upper cutofffmaxf\_\{\\max\}\. A margin of1×1\\timeswould putfmaxf\_\{\\max\}exactly at the Nyquist limit; values larger than1×1\\timesmean the band is oversampled and aliasing\-free at the saved rate\. We choseΔ​tsaved=14\\Delta t\_\{\\mathrm\{saved\}\}=14ms so that all five bands clear Nyquist with a margin of at least∼1\.4×\\sim 1\.4\\times\.

Table 14:Frequency bands for wavefield and shot\-gather generation\. Nyquist margin =1/\(2​fmax​Δ​tsaved\)1/\(2f\_\{\\max\}\\Delta t\_\{\\mathrm\{saved\}\}\)atΔ​tsaved=14\\Delta t\_\{\\mathrm\{saved\}\}=14ms, i\.e\., the factor by which each band’s upper cutoff falls below the saved\-sampling Nyquist limit of35\.735\.7Hz\. We use only the 3–6 Hz band for the wavefield prediction experiments in[Section˜5](https://arxiv.org/html/2605.30541#S5); all five bands are released on the Hugging Face dataset\.

### E\.4Source Wavelet

The source time function is a Ricker pulse with peak frequencyf0f\_\{0\}, then bandpass\-filtered with a 4th\-order zero\-phase Butterworth \(forward\-backward filtering viascipy\.signal\.sosfiltfilt\) at the band corners listed in[Table˜14](https://arxiv.org/html/2605.30541#A5.T14)\. The peak frequencies aref0∈\{4\.50,5\.75,7\.50,10\.25,14\.00\}f\_\{0\}\\in\\\{4\.50,5\.75,7\.50,10\.25,14\.00\\\}Hz for the five bands respectively, and each wavelet is centered att0=1/f0t\_\{0\}=1/f\_\{0\}\.

After filtering, the amplitude of the wavelet depends sensitively on the passband, so we apply a two\-step normalization for numerical stability and physical scaling\. We first divide by theℓ2\\ell\_\{2\}norm of the filtered signal, which puts every band’s wavelet on the same numerical footing\. We then multiply byB/24​Hz\\sqrt\{B/24\\,\\mathrm\{Hz\}\}, whereBBis the band’s bandwidth and2424Hz is a fixed reference bandwidth chosen to match a typical broadband seismic source \(e\.g\. a 2–26 Hz band\)\. ThisB\\sqrt\{B\}scaling restores the Parseval relationship between bandwidth and energy: a band\-limited signal’s radiated energy is proportional to its passband width, so wider\-band sources radiate more energy than narrower\-band sources, matching what a real source array does\. Without this rescaling, unit\-energy normalization alone would make all five bands radiate the same total energy, which is not physical\.

### E\.5Wavefield and shot\-gather acquisition geometry

All simulations are run in Devito\[Louboutinet al\.,[2019](https://arxiv.org/html/2605.30541#bib.bib56)\], a SymPy\-based domain\-specific language that JIT\-compiles vectorized OpenMP\-parallel C from finite\-difference stencils\. Two acquisition configurations are used, sharing the same grid, time step, and absorbing boundary but differing in source count, receiver geometry, and recording time\.

##### Wavefield generation \(5 s, single source, no receivers\)\.

For each \(slice, band\) pair we run one 5 s acoustic simulation with a single source placed at depth 10 m and a uniformly random horizontal position drawn from the lateral interval\[1\.1​km,xmax−1\.1​km\]\[1\.1\\,\\mathrm\{km\},\\,x\_\{\\mathrm\{max\}\}\-1\.1\\,\\mathrm\{km\}\]\. The 1\.1 km margin excludes the 60\-cell sponge \(0\.6 km\) plus a 0\.5 km surveying buffer on each side, so the source pulse propagates entirely inside the physical domain rather than into the damping zone\. Per\-slice randomization is reproducible: the seed is42⊕CRC3242\\oplus\\mathrm\{CRC32\}\(slice filename\)\. We save the full wavefieldp​\(𝐱,t\)p\(\\mathbf\{x\},t\)subsampled in time byk=14k=14, producing one tensor per \(slice, band\) of shape\(358,nx,619\)\(358,n\_\{x\},619\)\. With 4,276 slices and 5 bands, this gives 21,380 wavefield tensors in total\.

##### Shot\-gather generation \(8 s, 64 sources, 1000 receivers\)\.

For each \(slice, band\) pair we run one 8 s simulation that emulates a marine streamer survey\. 64 source positions are uniformly spaced across the same valid lateral interval at depth 10 m, and 1000 receivers at depth 10 m are uniformly spaced across the full lateral extent\. The receiver array is fixed in place \(common\-receiver streamer geometry\): the same 1000 receivers record all 64 source firings, rather than a streamer that translates with the source\. The output cube has shape\(64,572,1000\)=\(sources,time,receivers\)\(64,572,1000\)=\(\\mathrm\{sources\},\\mathrm\{time\},\\mathrm\{receivers\}\), where 572 is the number of saved time samples afterk=14k=14subsampling of the 8001 computational steps\. We use 8 s rather than 5 s here so the deepest reflections from the 6\.19 km depth column have time to return to the surface \(2⋅6\.19​km/vshallow≈82\\cdot 6\.19\\,\\mathrm\{km\}/v\_\{\\mathrm\{shallow\}\}\\approx 8s withvshallow≈1500v\_\{\\mathrm\{shallow\}\}\\approx 1500m/s in the water layer\)\.[Fig\.˜11](https://arxiv.org/html/2605.30541#A5.F11)shows representative cubes from three training settings\.

![Refer to caption](https://arxiv.org/html/2605.30541v1/x13.png)Figure 11:Representative 8 s shot gathers from three training settings: Fault \(top row\), F3 \(middle row\), and GoM \(bottom row\)\. For each setting, the gathers correspond to three near\-surface source positions on the left, center, and right of the lateral extent, color\-coded to indicate source location\.

## Appendix FWavefield Prediction

This appendix gives architecture, rollout, loss, training, and evaluation details for the wavefield\-prediction experiments of[Section˜5](https://arxiv.org/html/2605.30541#S5)\. The task is to predict the wavefield \(5 s of acoustic propagation at 3–6 Hz\) given the velocity slice and a binary source mask; we evaluate three operators: TFNO \(chunked autoregressive forward prediction\), TFNO\-interp \(anchor\-bounded interior prediction\), and DPOT \(transformer with adaptive Fourier mixing\)\.

### F\.1Task setup and preprocessing

##### Tensor shapes\.

Each chunked\-rollout example has input shape\(Cin,H,W\)=\(12,619,1000\)\(C\_\{\\text\{in\}\},H,W\)=\(12,619,1000\)whereCin=Tin\+2C\_\{\\text\{in\}\}=T\_\{\\text\{in\}\}\+2stacks the 10 wavefield seed frames, the velocity slice, and the binary source mask\. The per\-chunk target has shape\(Tout,H,W\)=\(50,619,1000\)\(T\_\{\\text\{out\}\},H,W\)=\(50,619,1000\)\. The full forward\-modeled trajectory has 358 saved snapshots; the first 10 form the seed window, leaving 348 frames to predict, so the inference\-time rollout chains⌈348/50⌉=7\\lceil 348/50\\rceil=7chunks\. The seventh chunk’s leading 48 frames are kept and its trailing 2 are discarded\.

##### Normalization\.

Wavefield amplitudes vary by orders of magnitude across samples \(different source positions, velocity contrasts, recording windows\), so each sample is normalized by its own peak input amplitude:

s=maxt,y,x⁡\|xwf​\(t,y,x\)\|,x~wf=xwf/s,y~=y/s\.s=\\max\\nolimits\_\{t,y,x\}\|x\_\{\\text\{wf\}\}\(t,y,x\)\|,\\quad\\tilde\{x\}\_\{\\text\{wf\}\}=x\_\{\\text\{wf\}\}/s,\\quad\\tilde\{y\}=y/s\.The same scalessis applied to the target so the model learns a unit\-amplitude mapping\. At inference time the prediction is rescaled byssto recover absolute amplitudes\. Velocities use a global min–max to\[0,1\]\[0,1\]over\[vmin,vmax\]=\[1500,5000\]\[v\_\{\\text\{min\}\},v\_\{\\text\{max\}\}\]=\[1500,5000\]m/s; the source mask is already binary and is passed through unchanged\.

### F\.2Architectures

The three operators we compare are TFNO, TFNO\-interp, and DPOT\. TFNO and TFNO\-interp share the same channel\-projection layout followingZhanget al\.\[[2023](https://arxiv.org/html/2605.30541#bib.bib65)\]: a1×11\\times 1convolutionDp1compresses theCinC\_\{\\text\{in\}\}input channels to a latent widthdwd\_\{w\}, the backbone operates on\(dw,H,W\)\(d\_\{w\},H,W\), and a1×11\\times 1convolutionDp2expands the latent representation to the per\-chunk output channel count\. DPOT does not use externalDp1/Dp2layers; its patch embedding compresses theCinC\_\{\\text\{in\}\}input channels directly into the embedding space, and an unpatchify head produces the output frames\.

##### TFNO\.

The Tucker\-factorized Fourier Neural Operator\[Liet al\.,[2021](https://arxiv.org/html/2605.30541#bib.bib79), Kossaifiet al\.,[2024](https://arxiv.org/html/2605.30541#bib.bib78)\]replaces the dense complex spectral weight tensor with a Tucker decomposition at rank fraction0\.50\.5, reducing the spectral parameter count by∼4×\{\\sim\}4\\timesat the same width\. The backbone configuration: 4 layers,n\_modes=\(32,64\)\\texttt\{n\\\_modes\}=\(32,64\),hidden\_channels=48\\texttt\{hidden\\\_channels\}=48,dw=40d\_\{w\}=40, GELU activations, linear FNO skips, soft\-gating channel\-MLP skip connections, grid positional embedding, and 5 % per\-side zero\-padding on the spatial domain to break the FFT’s implicit periodic boundary \(without padding the absorbing\-boundary energy at the bottom of the volume wraps to the free surface\)\. TheTout=50T\_\{\\text\{out\}\}=50output channels ofDp2produce one chunk per forward pass\.

##### TFNO\-interp\.

The interp variant uses the same TFNO2d backbone hyperparameters as TFNO but wraps it in aBoundaryAnchoredInterpPredictor: instead of forward prediction past a single seed window, it fills aτ=20\\tau=20frame interior gap between two stored 20\-frame anchor windows \(TL=TR=20T\_\{L\}=T\_\{R\}=20\)\. The wrapper predicts aτ\\tau\-frame residual via a sine\-windowed head:

u^​\(t\)=\(1−αt\)​uanchor end\+αt​uanchor start\+sin⁡\(π​αt\)​rθ​\(t\),αt=t\+1τ\+1,\\hat\{u\}\(t\)=\(1\-\\alpha\_\{t\}\)\\,u\_\{\\text\{anchor end\}\}\+\\alpha\_\{t\}\\,u\_\{\\text\{anchor start\}\}\+\\sin\(\\pi\\alpha\_\{t\}\)\\,r\_\{\\theta\}\(t\),\\qquad\\alpha\_\{t\}=\\tfrac\{t\+1\}\{\\tau\+1\},where the linear\-in\-αt\\alpha\_\{t\}part interpolates between the anchor frames,rθ​\(t\)r\_\{\\theta\}\(t\)is the network output, and the sine envelopesin⁡\(π​αt\)\\sin\(\\pi\\alpha\_\{t\}\)forces the prediction to match the anchor frames exactly at the gap endpoints \(αt=0,1\\alpha\_\{t\}=0,\\,1\) while peaking in the middle of the gap\.

##### DPOT\.

DPOT\[Haoet al\.,[2024](https://arxiv.org/html/2605.30541#bib.bib83)\]is a ViT\-style operator with patch embedding, AFNO \(Adaptive Fourier\) mixing, and learned positional information\. We use theDPOTAdapterSimpleconfiguration with embedding dimension 256, 4 transformer blocks,num\_blocks=4\\texttt\{num\\\_blocks\}=4\(block\-diagonal AFNO structure\),modes=16\\texttt\{modes\}=16\(AFNO frequency truncation\), MLP ratio 1\.0, patch size 8, and GELU activations\. As noted above, DPOT projects theCinC\_\{\\text\{in\}\}input channels into the embedding space through its patch embedding rather than through external1×11\\times 1convolutions, and an unpatchify head produces theToutT\_\{\\text\{out\}\}output frames\.

### F\.3Hybrid \(chunked autoregressive\) rollout

The full forward\-modeled trajectory has 358 saved snapshots \([Appendix˜E](https://arxiv.org/html/2605.30541#A5)\); the model only predictsTout=50T\_\{\\text\{out\}\}=50at once\. Autoregressive chaining is the operator\-learning analogue of how a standard finite\-difference solver advances from thet=0t=0initial condition to all subsequent times: each chunk uses the model’s previous output as its next initial condition, so the same single trained operator generates the full trajectory rather than a separate model per horizon\. Two extreme rollout strategies are unattractive: single\-step \(Tout=1T\_\{\\text\{out\}\}=1\) autoregression accumulates error over 348 steps and is known to compound to large drift in long horizons; full\-trajectory \(Tout=348T\_\{\\text\{out\}\}=348\) prediction in one forward pass would inflate the output channel count by∼\\sim7×\\timesrelative to our chunk size, exceeding the activation budget at the field\-scale grid of\(619,1000\)\(619,1000\)on a single A100\. We instead use a chunked autoregressive rollout \([Algorithm˜9](https://arxiv.org/html/2605.30541#alg9)\): the model producesToutT\_\{\\text\{out\}\}frames per pass, the lastTinT\_\{\\text\{in\}\}predicted frames seed the next pass, and per\-chunk amplitude renormalization mirrors the per\-sample normalization used during training\. WithTout=50T\_\{\\text\{out\}\}=50and 348 frames to predict,⌈348/50⌉=7\\lceil 348/50\\rceil=7chunks suffice; only the leading 48 frames of the seventh chunk are retained \(the trailing 2 fall past the end of the saved trajectory and are discarded\)\.

Algorithm 9autoregressive\_rollout— chunked wavefield prediction\.1:Trained model

MM; seed

u0:Tinu\_\{0:T\_\{\\text\{in\}\}\}; velocity

vv; source mask

mm; total length

TT; chunk size

ToutT\_\{\\text\{out\}\}; bounds

vmin,vmaxv\_\{\\text\{min\}\},v\_\{\\text\{max\}\}\.

2:Predicted wavefield

u^Tin:T\\hat\{u\}\_\{T\_\{\\text\{in\}\}:T\}\.

3:Normalize velocity once:

v~←\(v−vmin\)/\(vmax−vmin\)\\tilde\{v\}\\leftarrow\(v\-v\_\{\\text\{min\}\}\)/\(v\_\{\\text\{max\}\}\-v\_\{\\text\{min\}\}\)\.

4:Buffer

b←u0:Tinb\\leftarrow u\_\{0:T\_\{\\text\{in\}\}\}\(sliding window of last

TinT\_\{\\text\{in\}\}frames\)\.

5:for

k=0,1,…,⌈\(T−Tin\)/Tout⌉−1k=0,1,\\ldots,\\lceil\(T\-T\_\{\\text\{in\}\}\)/T\_\{\\text\{out\}\}\\rceil\-1do

6:

s←max⁡\|b\|s\\leftarrow\\max\|b\|;

b~←b/s\\tilde\{b\}\\leftarrow b/s\.⊳\\trianglerightper\-chunk amplitude scale

7:

x←\[b~,v~,m\]x\\leftarrow\[\\tilde\{b\},\\ \\tilde\{v\},\\ m\]stacked on the channel axis\.

8:

u^chunk←s⋅M​\(x\)\\hat\{u\}\_\{\\text\{chunk\}\}\\leftarrow s\\cdot M\(x\)\.⊳\\trianglerightdenormalize the prediction

9:Append the needed prefix of

u^chunk\\hat\{u\}\_\{\\text\{chunk\}\}to the output\.

10:Update

bb: take the last

TinT\_\{\\text\{in\}\}frames of

u^chunk\\hat\{u\}\_\{\\text\{chunk\}\}\(or slide if

Tout<TinT\_\{\\text\{out\}\}<T\_\{\\text\{in\}\}\)\.

11:endfor

### F\.4Loss

Training minimizes a temporally\-weighted relativeL2L^\{2\}loss:

ℒ=𝔼batch​\[∑tw​\(t\)​‖u^t−ut‖22∑tw​\(t\)​‖ut‖22\+ε\]\.\\mathcal\{L\}=\\mathbb\{E\}\_\{\\text\{batch\}\}\\left\[\\sqrt\{\\tfrac\{\\sum\_\{t\}w\(t\)\\,\\\|\\hat\{u\}\_\{t\}\-u\_\{t\}\\\|\_\{2\}^\{2\}\}\{\\sum\_\{t\}w\(t\)\\,\\\|u\_\{t\}\\\|\_\{2\}^\{2\}\+\\varepsilon\}\}\\right\]\.
The temporal weight schedulew​\(t\)w\(t\)is motivated by error compounding in autoregressive rollouts: errors in the first few predicted frames seed the input buffer for the next chunk and propagate forward, so accuracy at smalltthas higher leverage on long\-horizon stability than accuracy at largett\.Cokeret al\.\[[2025](https://arxiv.org/html/2605.30541#bib.bib80)\]formulate the same hypothesis for autoregressive neural operators on 1D PDEs \(“giving higher priority to earlier timesteps will reduce the overall propagation of errors and enhance training stability”\)\. Our schedule applies the practice statically:w​\(t\)w\(t\)decays linearly fromwstart=1\.0w\_\{\\text\{start\}\}=1\.0att=0t=0towend=0\.5w\_\{\\text\{end\}\}=0\.5att=Tout−1t=T\_\{\\text\{out\}\}\-1, scaling the gradient signal in proportion to that leverage\.

TFNO and DPOT use the linear temporal weighting just described\. TFNO\-interp uses uniform temporal weighting \(wstart=wend=1\.0w\_\{\\text\{start\}\}=w\_\{\\text\{end\}\}=1\.0\) because its prediction is bounded symmetrically by anchor frames at both endpoints — there is no early\-vs\-late asymmetry to exploit — plus an additional first\-difference temporal\-derivative term, weightαdt=0\.1\\alpha\_\{\\text\{dt\}\}=0\.1, that penalizes‖∂t\(u^−u\)‖22\\\|\\partial\_\{t\}\(\\hat\{u\}\-u\)\\\|\_\{2\}^\{2\}across the gap\.

### F\.5Training configuration

Table 15:Wavefield prediction training hyperparameters per architecture\. The three runs share the same optimizer, schedule shape, weight decay, gradient clipping, mixed\-precision policy, batch size, and epoch count, so the comparison is on architecture rather than training hyperparameters\. The one deviation is the base learning rate \(see text\)\.As shown in[Table˜15](https://arxiv.org/html/2605.30541#A6.T15), the one training\-hyperparameter deviation across architectures is the base learning rate: in preliminary short runs DPOT exhibited substantially larger batch\-level oscillations in training loss at2×10−32\\times 10^\{\-3\}, so we lowered DPOT’s base LR to5×10−45\\times 10^\{\-4\}with the same cosine decay\. The adjustment is consistent with the original DPOT recipe\[Haoet al\.,[2024](https://arxiv.org/html/2605.30541#bib.bib83)\], which uses AdamW with a one\-cycle schedule and a long warmup phase, implying a much smaller effective learning rate at the onset of training; using a smaller cosine peak LR for DPOT mirrors that effect within our cosine\-only schedule\.

All three runs are trained on2×2\\timesNVIDIA A100 80 GB via DDP, with effective batch 64 \(8×48\\times 4grad\-accum steps×2\\times 2GPUs\)\. We decompress the source HDF5 files to NPY files during preprocessing and use the NPY files for training, which allows us to avoid the decompression overhead of HDF5\. All reported runs use a single seed\. Each run takes about 24 hours\.

### F\.6Evaluation pipeline and reported metrics

We evaluate trained operators on the in\-distribution test set \(100 slices, five training settings with non\-overlapping indices\) and the out\-of\-distribution test set \(80 Penobscot slices\)\. For each \(operator, slice\) pair we run a forward rollout from the seed window, compute per\-sample per\-frame and per\-wavenumber metrics, write the per\-sample numbers into a parquet manifest, and aggregate across the test set to produce the figures reported in[Section˜5](https://arxiv.org/html/2605.30541#S5)\.

The metrics computed per sample are: relativeL2L^\{2\}error per frameLRE2​\(t\)=‖u^t−ut‖2/\(‖ut‖2\+ε\)L^\{2\}\_\{\\text\{RE\}\}\(t\)=\\\|\\hat\{u\}\_\{t\}\-u\_\{t\}\\\|\_\{2\}/\(\\\|u\_\{t\}\\\|\_\{2\}\+\\varepsilon\)\(plotted on the log\-y “L2RE” axis of[Fig\.˜6](https://arxiv.org/html/2605.30541#S5.F6)as a per\-frame mean across the test set, on a physical\-time axis, with in\-distribution and out\-of\-distribution test splits side\-by\-side\); a per\-wavenumber relative power\-spectrum error\|Ppred​\(k\)−Pgt​\(k\)\|/\(Pgt​\(k\)\+ε\)\|P\_\{\\text\{pred\}\}\(k\)\-P\_\{\\text\{gt\}\}\(k\)\|/\(P\_\{\\text\{gt\}\}\(k\)\+\\varepsilon\), whereP​\(k\)P\(k\)is the azimuthally\-averaged radial 2D power spectrum first averaged over the trajectory’s frames \(this is*not*a vectorL2L^\{2\}norm; it is plotted as “Relative spectral error” in[Fig\.˜12](https://arxiv.org/html/2605.30541#A6.F12)\); trajectory\-wide rel\-L2L^\{2\}; max\-absolute\-error; energy ratioE​\[−1\]/E​\[0\]E\[\-1\]/E\[0\]and per\-frame energy std; divergence time \(first frame where‖u^‖∞\>10​‖u‖∞\\\|\\hat\{u\}\\\|\_\{\\infty\}\>10\\,\\\|u\\\|\_\{\\infty\}or NaN/Inf\); per\-frame SSIM; and a physics\-residual ratio computed from the discretized acoustic wave operator\.

The signed\-error panel grids showu^−u\\hat\{u\}\-uat four selected frames on a shared symmetric color scale, with a velocity\-grayscale backdrop \(α=0\.3\\alpha=0\.3\) and a star marking the source position\. The chunked\-rollout panel \([Fig\.˜4](https://arxiv.org/html/2605.30541#S5.F4)\) is a one\-shot teacher\-forced chunk seeded from GT frames\[240,250\)\[240,250\)and predicting frames\[250,300\)\[250,300\)at sampled positions 256, 270, 285, 299 \(no autoregressive chaining for this figure\)\. The TFNO\-interp panel \([Fig\.˜5](https://arxiv.org/html/2605.30541#S5.F5)\) shows signed error at gap centers \(frames 30, 150, 230, 270\) of the stitched rollout; its colorbar reuses[Fig\.˜4](https://arxiv.org/html/2605.30541#S5.F4)’s clim so error magnitudes are directly comparable across the two protocols\.[Fig\.˜6](https://arxiv.org/html/2605.30541#S5.F6)additionally shades TFNO\-interp’s gap\-prediction windows in light gray and diagonally hatches the trailing window past the last anchor \(frames 340–358, equivalently 4\.76–5\.00 s, which the model does not predict\)\.

Each metric tells us a different thing: the per\-frame rel\-L2L^\{2\}curve shows how prediction error compounds \(or stays bounded\) as the rollout extends; the spectral curve isolates the failure mode of neural operators at high wavenumbers, the spectral\-bias regime; the energy ratio diagnoses whether the rollout is dissipating or blowing up; divergence time picks out catastrophic failures; and the physics\-residual ratio measures how well the prediction satisfies the governing PDE without referring to the ground truth\.

### F\.7Per\-wavenumber and per\-geology breakdowns

##### Per\-wavenumber error\.

[Fig\.˜12](https://arxiv.org/html/2605.30541#A6.F12)plots the per\-wavenumber relative power\-spectrum error\. TFNO\-interp dominates across the spectrum; DPOT outperforms TFNO at high wavenumbers\.

![Refer to caption](https://arxiv.org/html/2605.30541v1/x14.png)Figure 12:Per\-wavenumber relative spectral error on the in\-distribution \(left\) and out\-of\-distribution \(right\) test sets\. TFNO\-interp dominates across the spectrum; DPOT outperforms TFNO at high wavenumbers\.
##### Per\-geology error\.

[Fig\.˜13](https://arxiv.org/html/2605.30541#A6.F13)displays the L2RE per geological setting in the in\-distribution test set\. Across all architectures, the Gulf of Mexico and Salt Canopy settings are the most challenging, while SEAM and F3 are the easiest\. We suspect this is due to Gulf of Mexico and Salt Canopy being the most salt\-rich settings in the dataset\.

![Refer to caption](https://arxiv.org/html/2605.30541v1/x15.png)Figure 13:L2RE versus time for each geological setting in the in\-distribution test set\. Gulf of Mexico and Salt Canopy are the most challenging settings, while SEAM and F3 are the easiest\.

## Appendix GEnd\-to\-End Seismic Inversion

This appendix gives architecture, loss, and training details for the end\-to\-end encoder–decoder inversion experiments of[Section˜6](https://arxiv.org/html/2605.30541#S6)\. The task is to map a 3D shot\-gather cube \(produced by 8 s of acoustic propagation at 3–25 Hz\) to a 2Dvp​\(𝐱\)v\_\{p\}\(\\mathbf\{x\}\)velocity model\.

### G\.1Task setup and preprocessing

##### Tensor shapes\.

The input is a 3D shot\-gather cube of shape\(nshots,ntime,nreceivers\)=\(64,572,1000\)\(n\_\{\\text\{shots\}\},n\_\{\\text\{time\}\},n\_\{\\text\{receivers\}\}\)=\(64,572,1000\), presented to the network as\(B,1,64,572,1000\)\(B,1,64,572,1000\), whereBBis the batchsize\. The output is a 2D velocity slice of shape\(B,1,619,1000\)\(B,1,619,1000\)in the normalized velocity range\[0,1\]\[0,1\]\.

##### Normalization\.

Shot gathers are normalized by sign\-preserving clipping\. We compute a single global clipping threshold once during preprocessing\. We randomly sample 200 training shot\-gather cubes, take the 98th percentile of the absolute amplitudes within each cube, then take the 98th percentile of those per\-cube values to obtain the global threshold\. At training time, every shot\-gather cube has its amplitudes clipped at plus or minus this threshold and divided by the threshold, mapping values to\[−1,1\]\[\-1,1\]\. Velocities are mapped to\[0,1\]\[0,1\]by an affine transformation that sends 1500 m/s to0and 5000 m/s to11\.

##### Augmentation\.

During training, the \(shot\-gather cube, velocity slice\) pairs are augmented with a horizontal flip \(probability0\.50\.5, applied jointly to the receiver axis of the cube and the horizontal axis of the velocity slice so their relationship through the acoustic wave equation \([1](https://arxiv.org/html/2605.30541#S2.E1)\) is preserved\) and additive Gaussian noise with signal\-to\-noise ratio drawn uniformly between\[10,30\]\[10,30\]dB\.

### G\.2Architectures

We compare three encoder–decoder architectures\. The Transformer and CNN encoders share an identical architecture for the decoder\. InversionNet is used because it has been employed in OpenFWI\[Denget al\.,[2022](https://arxiv.org/html/2605.30541#bib.bib1)\]\.

##### InversionNet\.

A direct port of the 2D encoder–decoder ofWu and Lin \[[2020](https://arxiv.org/html/2605.30541#bib.bib58)\]\. The shot\-gather cube is treated as a 2D image with 64 channels \(one per shot\) and processed by a convolutional encoder followed by a deconvolution \(a\.k\.a\. transposed convolution\) decoder, with a final resampling to the\(619,1000\)\(619,1000\)output grid\.

##### 3D residual CNN encoder \+ 2D decoder\.

A 3D residual encoder following nnU\-Net ResEnc principles\[Isenseeet al\.,[2024](https://arxiv.org/html/2605.30541#bib.bib61)\]\. The first encoder stage uses an asymmetric stride that aggressively downsamples the receiver axis while preserving the shot axis; remaining stages use symmetric strides\. A 3D\-to\-2D transition then collapses the shot dimension to produce a 2D feature map, which is fed to a 2D residual decoder that lifts the feature map back to the\(619,1000\)\(619,1000\)output grid and applies a sigmoid to yield a single normalized velocity channel\.

##### 3D Swin encoder \+ 2D decoder\.

A small CNN stem performs spatial downsampling of the cube\. A 3D patch embedding then maps the result to a token grid that is processed by a 3D Swin encoder\[Liuet al\.,[2021](https://arxiv.org/html/2605.30541#bib.bib59), Hatamizadehet al\.,[2022](https://arxiv.org/html/2605.30541#bib.bib60)\]via windowed self\-attention with shifted\-window blocks and patch merging between stages\. The output of the encoder is collapsed to a 2D feature map and fed to the same decoder architecture used by the CNN architecture\.

### G\.3Loss

Training minimizes the mean squared error between the predicted and ground\-truth normalized velocity slices, averaged over the training set and the height and width of each slice:

ℒ=1N​H​W​∑n=1N∑i=1H∑j=1W\(v^n,i,j−vn,i,j\)2,\\mathcal\{L\}=\\frac\{1\}\{N\\,H\\,W\}\\sum\_\{n=1\}^\{N\}\\sum\_\{i=1\}^\{H\}\\sum\_\{j=1\}^\{W\}\\left\(\\hat\{v\}\_\{n,i,j\}\-v\_\{n,i,j\}\\right\)^\{2\},whereNNis the number of training pairs,H=619H=619andW=1000W=1000are the height and width of the velocity slice,vn,i,j∈\[0,1\]v\_\{n,i,j\}\\in\[0,1\]is the normalized ground\-truth velocity at pixel\(i,j\)\(i,j\)of thennth training sample, andv^n,i,j∈\[0,1\]\\hat\{v\}\_\{n,i,j\}\\in\[0,1\]is the corresponding model prediction\.

### G\.4Training configuration

Table 16:Inversion training hyperparameters per architecture \(3–25 Hz frequency band, 8 s propagation\)\. All three architectures share the same optimizer, schedule, and augmentation\.We list training hyperparameters per architecture in[Table˜16](https://arxiv.org/html/2605.30541#A7.T16)\. All three architectures train on2×2\\timesNVIDIA A100 80 GB via PyTorch DDP, with effective batchsize 16 \(8×18\\times 1grad\-accum steps×2\\times 2GPUs\)\. We train three random seeds per architecture and report mean RMSE and SSIM in the main text\. We decompress the source HDF5 files to NPY files during preprocessing and use the NPY files for training, which allows us to avoid the decompression overhead of HDF5\. Each training run takes about 20 hours\.

Similar Articles

Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

Hugging Face Daily Papers

Sat3DGen introduces a geometry-first approach for generating street-level 3D scenes from a single satellite image, achieving improved geometric accuracy and photorealism through novel constraints and training strategies. The method demonstrates significant improvements over prior work on the VIGOR-OOD benchmark.

ABot-Earth 0.5: Generative 3D Earth Model

Hugging Face Daily Papers

ABot-Earth 0.5 is a generative 3D framework that synthesizes realistic 3D urban environments from satellite imagery using 3D Gaussian Splatting, enabling real-time visualization and closed-loop UAV navigation at low cost.

Surflo: Consistent 3D Surface Flow Model with Global State

Hugging Face Daily Papers

Surflo is a feed-forward 3D reconstruction model that compresses unposed RGB views into latent tokens and decodes consistent 3D surface points via flow matching, enabling variable-resolution output and outperforming existing methods in speed.

SurGe: Improved Surface Geometry in Point Maps

Hugging Face Daily Papers

SurGe introduces a Neighborhood Attention Decoder and a reformulated scale-invariant gradient matching loss to improve local surface geometry accuracy in feedforward 3D reconstruction, particularly for thin structures. It achieves state-of-the-art average rank on zero-shot monocular geometry benchmarks, with better local point map and normal metrics.