No One Knows the State of the Art in Geospatial Foundation Models

Hugging Face Daily Papers Papers

Summary

This paper audits 152 papers on geospatial foundation models and finds severe lack of standardization, making it impossible to determine state-of-the-art. The authors propose six concrete expectations to improve reproducibility and comparability.

Geospatial foundation models (GFMs) have been proposed as generalizable backbones for disaster response, land-cover mapping, food-security monitoring, and other high-stakes Earth-observation tasks. Yet the published work about these models does not give reviewers or users enough information to tell which model fits a given task. We argue that nobody knows what the current state of the art is in geospatial foundation models. The methods may be useful, but the GFM literature does not standardize evaluations, training and testing protocols, released weights, or pretraining controls well enough for anyone to compare or rank them. In a 152-paper audit, we find 46 cross-paper disagreements of at least 10 points for the same model, benchmark, and protocol; 94/126 papers with extractable pretraining data use a configuration no other paper uses; and 39% of GFM papers release no model weights. This lack of community standards can be solved. We propose six concrete expectations: named-license weight release, shared core evaluations, copied-versus-rerun baseline annotations, variance reporting, one shared evaluation harness, and data-vs-architecture-vs-algorithm controls. These gaps are a coordination failure, not a fault of any individual lab; the authors of this paper, like many others in the GFM community, have contributed to them. Rather than just critiquing the community, we aim to provide concrete steps toward a shared understanding of how to innovate GFMs.
Original Article
View Cached Full Text

Cached at: 05/18/26, 10:28 PM

Paper page - No One Knows the State of the Art in Geospatial Foundation Models

Source: https://huggingface.co/papers/2605.12678

Abstract

Geospatial foundation models lack standardized evaluation and reporting practices, creating inconsistency in performance comparisons and limiting reproducibility across studies.

Geospatial foundation models(GFMs) have been proposed as generalizable backbones fordisaster response,land-cover mapping,food-security monitoring, and other high-stakes Earth-observation tasks. Yet the published work about these models does not give reviewers or users enough information to tell which model fits a given task. We argue that nobody knows what the current state of the art is ingeospatial foundation models. The methods may be useful, but the GFM literature does not standardize evaluations, training and testing protocols, released weights, orpretraining controlswell enough for anyone to compare or rank them. In a 152-paper audit, we find 46 cross-paper disagreements of at least 10 points for the same model, benchmark, and protocol; 94/126 papers with extractable pretraining data use a configuration no other paper uses; and 39% of GFM papers release nomodel weights. This lack of community standards can be solved. We propose six concrete expectations: named-license weight release, shared core evaluations, copied-versus-rerun baseline annotations, variance reporting, one shared evaluation harness, and data-vs-architecture-vs-algorithm controls. These gaps are a coordination failure, not a fault of any individual lab; the authors of this paper, like many others in the GFM community, have contributed to them. Rather than just critiquing the community, we aim to provide concrete steps toward a shared understanding of how to innovate GFMs.

View arXiv pageView PDFGitHub13Add to collection

Get this paper in your agent:

hf papers read 2605\.12678

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.12678 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.12678 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.12678 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

Hugging Face Daily Papers

SpatialBench is a comprehensive benchmark for evaluating spatial foundation models across diverse domains and tasks, revealing limitations in current models and introducing DA-Next-5M and DA-Next to advance spatial representation learning.

Do Foundation Model Embeddings Improve Cross-Country Crop Yield Generalisation? A Leave-One-Country-Out Evaluation in Sub-Saharan Africa

arXiv cs.LG

This paper evaluates whether geospatial foundation model embeddings like Prithvi-EO improve cross-country crop yield prediction in Sub-Saharan Africa compared to traditional Sentinel-2 features. The study finds that frozen embeddings do not significantly outperform spectral medians under rigorous Leave-One-Country-Out validation, suggesting country-level distribution shift is the primary bottleneck rather than feature representation quality.

Measuring Representation Robustness in Large Language Models for Geometry

arXiv cs.CL

Researchers introduce GeoRepEval, a framework to evaluate LLM robustness across equivalent geometric problem representations (Euclidean, coordinate, vector). Testing 11 LLMs on 158 geometry problems, they find accuracy gaps up to 14 percentage points based solely on representation choice, with vector formulations being a consistent failure point.