EO-WM: A Physically Informed World Model for Probabilistic Earth Observation Forecasting

Hugging Face Daily Papers Papers

Summary

EO-WM proposes a video diffusion transformer for probabilistic Earth observation forecasting that incorporates physically informed conditioning to capture weather-driven uncertainties, achieving improved prediction of vegetation indices under extreme weather.

Earth Observation (EO) forecasting aims to predict future Earth surface dynamics from satellite observations under changing meteorological conditions. In this paper, we view this task as a partially observed, weather-driven world modeling problem, in which weather acts as a conditioning signal, while forecasting remains uncertain due to sparse observations and unobserved land-surface states. However, existing methods do not fully capture this setting: deterministic models collapse uncertainty into a single future prediction, while diffusion-based methods typically treat weather variables as undifferentiated conditioning signals, and existing benchmarks focus mainly on reconstruction accuracy rather than whether forecasts respond correctly to changed weather forcing.We introduce EO-WM, a video diffusion transformer for multispectral EO forecasting. EO-WM incorporates a physically informed conditioning framework that represents meteorological forcing through a climatological baseline, weather anomalies, and cumulative physical stress signals. Specifically, it separates baseline and anomaly through distinct conditioning pathways, and accumulates anomalous forcing over time to capture sustained heat and drought stress. To evaluate weather-response behavior beyond standard metrics, we introduce two diagnostic benchmarks: an Extreme Summer Benchmark for severity-aware prediction of vegetation degradation under extreme weather, and a Seasonal Matched-Pair Benchmark for testing response fidelity under changed weather forcing. Experiments show that EO-WM reduces the error in predicted Normalized Difference Vegetation Index (NDVI) decline amplitude by a relative 5.63% and improves directional hit rate by a relative 7.80%, while remaining competitive on standard pixel-level metrics. The benchmarks and model will be made open-source at https://github.com/Luo-Z13/EO-WM.
Original Article
View Cached Full Text

Cached at: 06/26/26, 06:08 PM

Paper page - EO-WM: A Physically Informed World Model for Probabilistic Earth Observation Forecasting

Source: https://huggingface.co/papers/2606.27277

Abstract

EO-WM is a video diffusion transformer for multispectral Earth Observation forecasting that incorporates physically informed conditioning frameworks to better capture weather-driven uncertainties in land-surface dynamics.

Earth Observation (EO) forecasting aims to predict future Earth surface dynamics from satellite observations under changing meteorological conditions. In this paper, we view this task as a partially observed, weather-driven world modeling problem, in which weather acts as a conditioning signal, while forecasting remains uncertain due to sparse observations and unobserved land-surface states. However, existing methods do not fully capture this setting: deterministic models collapse uncertainty into a single future prediction, while diffusion-based methods typically treat weather variables as undifferentiated conditioning signals, and existing benchmarks focus mainly on reconstruction accuracy rather than whether forecasts respond correctly to changed weather forcing.We introduce EO-WM, avideo diffusion transformerformultispectral EO forecasting. EO-WM incorporates aphysically informed conditioning frameworkthat representsmeteorological forcingthrough aclimatological baseline,weather anomalies, andcumulative physical stress signals. Specifically, it separates baseline and anomaly through distinct conditioning pathways, and accumulates anomalous forcing over time to capture sustained heat and drought stress. To evaluate weather-response behavior beyond standard metrics, we introduce two diagnostic benchmarks: an Extreme Summer Benchmark for severity-aware prediction of vegetation degradation under extreme weather, and a Seasonal Matched-Pair Benchmark for testing response fidelity under changed weather forcing. Experiments show that EO-WM reduces the error in predictedNormalized Difference Vegetation Index(NDVI) decline amplitude by a relative 5.63% and improves directional hit rate by a relative 7.80%, while remaining competitive on standard pixel-level metrics. The benchmarks and model will be made open-source at https://github.com/Luo-Z13/EO-WM.

View arXiv pageView PDFGitHub3Add to collection

Get this paper in your agent:

hf papers read 2606\.27277

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.27277 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.27277 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.27277 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

τ_0-WM: A Unified Video-Action World Model for Robotic Manipulation

Hugging Face Daily Papers

τ_0-WM is a unified video-action world model for robotic manipulation that integrates policy learning, video prediction, and action evaluation using a shared video diffusion backbone. It shows superior performance on challenging long-horizon and fine-grained tasks.

Holo-World: Unified Camera, Object and Weather Control for Video World Model

Hugging Face Daily Papers

Holo-World presents a unified controllable video world model that generates videos from a single image with explicit control over camera, object motion, and weather. It introduces a novel dataset and techniques to preserve scene structure while transferring to target weather states.

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

Hugging Face Daily Papers

Qwen-RobotWorld is a language-conditioned video world model that predicts future visual trajectories across multiple robotic domains using a double-stream diffusion transformer and an 8.6M video-text corpus. It unifies embodied world modeling for robotic manipulation, autonomous driving, indoor navigation, and human-to-robot transfer, achieving top benchmarks on EWMBench and DreamGen Bench.