The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning
Summary
The Well is a large-scale collection of 15TB of diverse physics simulation datasets across 16 domains, designed to benchmark machine learning surrogate models for spatiotemporal physical systems. It provides a unified PyTorch interface and example baselines to accelerate simulation-based workflows.
View Cached Full Text
Cached at: 06/27/26, 05:18 PM
Paper page - The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning
Source: https://huggingface.co/papers/2412.00568 Published on Nov 30, 2024
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
A large-scale dataset collection, The Well, provides diverse numerical simulations for benchmarking machine learning models in physical systems simulation.
Machine learning basedsurrogate modelsoffer researchers powerful tools for acceleratingsimulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containingnumerical simulationsof a wide variety ofspatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such asbiological systems,fluid dynamics,acoustic scattering, as well asmagneto-hydrodynamic simulationsof extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broaderbenchmark suite. To facilitate usage of the Well, we provide a unifiedPyTorch interfacefor training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https://github.com/PolymathicAI/the_well.
View arXiv pageView PDFGitHub3.63kAdd to collection
Get this paper in your agent:
hf papers read 2412\.00568
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2412.00568 in a model README.md to link it from this page.
Datasets citing this paper13
#### polymathic-ai/acoustic_scattering_inclusions UpdatedApr 10, 2025 • 23.5k #### polymathic-ai/rayleigh_benard UpdatedApr 10, 2025 • 6.88k #### polymathic-ai/planetswe UpdatedApr 10, 2025 • 5.87k • 1 #### polymathic-ai/acoustic_scattering_discontinuous UpdatedApr 10, 2025 • 5.34k • 1 Browse 13 datasets citing this paper### Spaces citing this paper1
Collections including this paper5
Similar Articles
ThousandWorlds: A benchmark for climate emulation of potentially habitable exoplanets
ThousandWorlds is a benchmark dataset for machine-learning emulation of exoplanet climates, containing approximately 1800 simulations from five global climate models. Gaussian process methods outperform deep learning baselines in this low-data, multi-simulator regression task.
@heyrobinai: THE ENTIRE AI INDUSTRY JUST GOT HUMILIATED a tiny model trained in just a few hours on a single graphics card is planni…
Yann LeCun's team releases LeWorldModel, a tiny 15M-parameter physics model trained on a single GPU in hours that outperforms billion-dollar foundation models in planning speed and physical plausibility, challenging the dominant scaling paradigm.
Synthics: Synthetic Physics-like Datasets for Machine Learning
A method using Bayesian Probabilistic Context-Free Grammar to generate synthetic regression datasets that structurally resemble physics equations, validated against the Feynman corpus and shown to be effective for hyperparameter tuning.
Surface Evolver Bench: my benchmark asking LLMs to write complex physical simulations in a custom data format
Introduces Surface Evolver Bench, a benchmark that evaluates LLMs on writing complex physical simulations in a custom data format.
@lvwerra: We released physics-intern: a simple harness for science problems! It gets models like Gemini 3.1 Pro to go from 17.7 -…
Released physics-intern, a simple harness that significantly boosts the performance of reasoning models like Gemini 3.1 Pro on science problems, from 17.7 to 31.4, outperforming GPT 5.5 Pro.