The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Papers with Code Trending Papers

Summary

The Well is a large-scale collection of 15TB of diverse physics simulation datasets across 16 domains, designed to benchmark machine learning surrogate models for spatiotemporal physical systems. It provides a unified PyTorch interface and example baselines to accelerate simulation-based workflows.

Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containing numerical simulations of a wide variety of spatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such as biological systems, fluid dynamics, acoustic scattering, as well as magneto-hydrodynamic simulations of extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broader benchmark suite. To facilitate usage of the Well, we provide a unified PyTorch interface for training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https://github.com/PolymathicAI/the_well.
Original Article
View Cached Full Text

Cached at: 06/27/26, 05:18 PM

Paper page - The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Source: https://huggingface.co/papers/2412.00568 Published on Nov 30, 2024

Authors:

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Abstract

A large-scale dataset collection, The Well, provides diverse numerical simulations for benchmarking machine learning models in physical systems simulation.

Machine learning basedsurrogate modelsoffer researchers powerful tools for acceleratingsimulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containingnumerical simulationsof a wide variety ofspatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such asbiological systems,fluid dynamics,acoustic scattering, as well asmagneto-hydrodynamic simulationsof extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broaderbenchmark suite. To facilitate usage of the Well, we provide a unifiedPyTorch interfacefor training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https://github.com/PolymathicAI/the_well.

View arXiv pageView PDFGitHub3.63kAdd to collection

Get this paper in your agent:

hf papers read 2412\.00568

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2412.00568 in a model README.md to link it from this page.

Datasets citing this paper13

#### polymathic-ai/acoustic_scattering_inclusions UpdatedApr 10, 2025 • 23.5k #### polymathic-ai/rayleigh_benard UpdatedApr 10, 2025 • 6.88k #### polymathic-ai/planetswe UpdatedApr 10, 2025 • 5.87k • 1 #### polymathic-ai/acoustic_scattering_discontinuous UpdatedApr 10, 2025 • 5.34k • 1 Browse 13 datasets citing this paper### Spaces citing this paper1

Collections including this paper5

Browse 5 collections that include this paper

Similar Articles

Synthics: Synthetic Physics-like Datasets for Machine Learning

arXiv cs.LG

A method using Bayesian Probabilistic Context-Free Grammar to generate synthetic regression datasets that structurally resemble physics equations, validated against the Feynman corpus and shown to be effective for hyperparameter tuning.