The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Papers with Code Trending 11/30/24, 07:42 PM Papers

physics-simulation dataset machine-learning benchmark spatiotemporal fluid-dynamics numerical-simulation

Summary

The Well is a large-scale collection of 15TB of diverse physics simulation datasets across 16 domains, designed to benchmark machine learning surrogate models for spatiotemporal physical systems. It provides a unified PyTorch interface and example baselines to accelerate simulation-based workflows.

Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containing numerical simulations of a wide variety of spatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such as biological systems, fluid dynamics, acoustic scattering, as well as magneto-hydrodynamic simulations of extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broader benchmark suite. To facilitate usage of the Well, we provide a unified PyTorch interface for training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https://github.com/PolymathicAI/the_well.

Original Article

View Cached Full Text

Cached at: 06/27/26, 05:18 PM

Paper page - The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Source: https://huggingface.co/papers/2412.00568 Published on Nov 30, 2024

Authors:

Abstract

A large-scale dataset collection, The Well, provides diverse numerical simulations for benchmarking machine learning models in physical systems simulation.

Machine learning basedsurrogate modelsoffer researchers powerful tools for acceleratingsimulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containingnumerical simulationsof a wide variety ofspatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such asbiological systems,fluid dynamics,acoustic scattering, as well asmagneto-hydrodynamic simulationsof extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broaderbenchmark suite. To facilitate usage of the Well, we provide a unifiedPyTorch interfacefor training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https://github.com/PolymathicAI/the_well.

View arXiv page View PDF GitHub3.63k Add to collection

Get this paper in your agent:

hf papers read 2412\.00568

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2412.00568 in a model README.md to link it from this page.

Datasets citing this paper13

#### polymathic-ai/acoustic_scattering_inclusions UpdatedApr 10, 2025 • 23.5k #### polymathic-ai/rayleigh_benard UpdatedApr 10, 2025 • 6.88k #### polymathic-ai/planetswe UpdatedApr 10, 2025 • 5.87k • 1 #### polymathic-ai/acoustic_scattering_discontinuous UpdatedApr 10, 2025 • 5.34k • 1 Browse 13 datasets citing this paper### Spaces citing this paper1

Collections including this paper5

Browse 5 collections that include this paper

The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Paper page - The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Abstract

Models citing this paper0

Datasets citing this paper13

Collections including this paper5

Similar Articles

ThousandWorlds: A benchmark for climate emulation of potentially habitable exoplanets

@heyrobinai: THE ENTIRE AI INDUSTRY JUST GOT HUMILIATED a tiny model trained in just a few hours on a single graphics card is planni…

Synthics: Synthetic Physics-like Datasets for Machine Learning

Surface Evolver Bench: my benchmark asking LLMs to write complex physical simulations in a custom data format

@lvwerra: We released physics-intern: a simple harness for science problems! It gets models like Gemini 3.1 Pro to go from 17.7 -…

Submit Feedback

Similar Articles

ThousandWorlds: A benchmark for climate emulation of potentially habitable exoplanets

@heyrobinai: THE ENTIRE AI INDUSTRY JUST GOT HUMILIATED a tiny model trained in just a few hours on a single graphics card is planni…

Synthics: Synthetic Physics-like Datasets for Machine Learning

Surface Evolver Bench: my benchmark asking LLMs to write complex physical simulations in a custom data format

@lvwerra: We released physics-intern: a simple harness for science problems! It gets models like Gemini 3.1 Pro to go from 17.7 -…