No One Knows the State of the Art in Geospatial Foundation Models
Summary
This paper audits 152 papers on geospatial foundation models and finds severe lack of standardization, making it impossible to determine state-of-the-art. The authors propose six concrete expectations to improve reproducibility and comparability.
View Cached Full Text
Cached at: 05/18/26, 10:28 PM
Paper page - No One Knows the State of the Art in Geospatial Foundation Models
Source: https://huggingface.co/papers/2605.12678
Abstract
Geospatial foundation models lack standardized evaluation and reporting practices, creating inconsistency in performance comparisons and limiting reproducibility across studies.
Geospatial foundation models(GFMs) have been proposed as generalizable backbones fordisaster response,land-cover mapping,food-security monitoring, and other high-stakes Earth-observation tasks. Yet the published work about these models does not give reviewers or users enough information to tell which model fits a given task. We argue that nobody knows what the current state of the art is ingeospatial foundation models. The methods may be useful, but the GFM literature does not standardize evaluations, training and testing protocols, released weights, orpretraining controlswell enough for anyone to compare or rank them. In a 152-paper audit, we find 46 cross-paper disagreements of at least 10 points for the same model, benchmark, and protocol; 94/126 papers with extractable pretraining data use a configuration no other paper uses; and 39% of GFM papers release nomodel weights. This lack of community standards can be solved. We propose six concrete expectations: named-license weight release, shared core evaluations, copied-versus-rerun baseline annotations, variance reporting, one shared evaluation harness, and data-vs-architecture-vs-algorithm controls. These gaps are a coordination failure, not a fault of any individual lab; the authors of this paper, like many others in the GFM community, have contributed to them. Rather than just critiquing the community, we aim to provide concrete steps toward a shared understanding of how to innovate GFMs.
View arXiv pageView PDFGitHub13Add to collection
Get this paper in your agent:
hf papers read 2605\.12678
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.12678 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.12678 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.12678 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
SpatialBench: Is Your Spatial Foundation Model an All-Round Player?
SpatialBench is a comprehensive benchmark for evaluating spatial foundation models across diverse domains and tasks, revealing limitations in current models and introducing DA-Next-5M and DA-Next to advance spatial representation learning.
Do Foundation Model Embeddings Improve Cross-Country Crop Yield Generalisation? A Leave-One-Country-Out Evaluation in Sub-Saharan Africa
This paper evaluates whether geospatial foundation model embeddings like Prithvi-EO improve cross-country crop yield prediction in Sub-Saharan Africa compared to traditional Sentinel-2 features. The study finds that frozen embeddings do not significantly outperform spectral medians under rigorous Leave-One-Country-Out validation, suggesting country-level distribution shift is the primary bottleneck rather than feature representation quality.
Measuring Representation Robustness in Large Language Models for Geometry
Researchers introduce GeoRepEval, a framework to evaluate LLM robustness across equivalent geometric problem representations (Euclidean, coordinate, vector). Testing 11 LLMs on 158 geometry problems, they find accuracy gaps up to 14 percentage points based solely on representation choice, with vector formulations being a consistent failure point.
Assessing the Operational Viability of Foundation Models for Time Series Forecasting
This paper presents an applied evaluation of foundation models for time series forecasting compared to supervised approaches across four operational domains, and proposes a Complexity Router to selectively assign series to the optimal model class for balancing accuracy and inference cost.
World Models: A Comprehensive Survey of Architectures, Methodologies, Reasoning Paradigms, and Applications
A comprehensive survey of world models that provides a multi-axis taxonomy covering architectures, methodologies, reasoning strategies, and applications across AI domains, including key systems like Dreamer, MuZero, and Sora.