OlmoEarth v1.1: A more efficient family of models

Hugging Face Blog 05/19/26, 06:38 PM Models

earth-observation remote-sensing vision-model transformer efficient-inference open-source huggingface

Summary

OlmoEarth v1.1 is a new family of satellite imagery analysis models from Allen AI that reduces compute costs by up to 3x while maintaining performance, achieved by decreasing token sequence lengths in transformer-based models.

No content available

Original Article

View Cached Full Text

Cached at: 05/20/26, 02:23 AM

OlmoEarth v1.1: A more efficient family of models

Source: https://huggingface.co/blog/allenai/olmoearth-v1-1 Back to Articles

🧠 Models:https://huggingface.co/collections/allenai/olmoearth| 📄 Tech Report:https://allenai.org/papers/olmoearth_v1_1| 💻 Code:https://github.com/allenai/olmoearth_pretrain

We released OlmoEarth (v1) in November 2025. Since then, partners have applied it across a wide range of tasks, from tracking mangrove change to classifying drivers of forest loss to producing country-scale crop-type maps in days, scaling deployments to national, continental, and global areas. Every release moves us closer to our mission: bringing state-of-the-art AI to organizations and communities working to protect people and our planet.

WhenOlmoEarthprocesses satellite imagery to make predictions across tens to hundreds of thousands of square kilometers, efficiency shapes what’s possible. Over the full lifecycle of running OlmoEarth – data export, preprocessing, inference, and post-processing – compute is by far the highest cost. A more efficient model means we can support more partners on the OlmoEarth Platform, and that anyone running OlmoEarth on their own can leverage this technology faster and at lower expense.

That’s why we built**OlmoEarth v1.1: a new family of models that cuts compute costs by up to3x**while maintaining OlmoEarth v1’s performance on a mix of research benchmarks and tasks we’ve constructed with partners.

https://huggingface.co/blog/allenai/olmoearth-v1-1#increasing-efficiency-by-decreasing-sequence-lengthsIncreasing efficiency by decreasing sequence lengths

The OlmoEarth models are transformer-based models, one of the dominant architectures in machine learning today. To process remote sensing data, we first convert it into a sequence oftokensthe model can ingest.

Two important levers control efficiency in transformer-based models:model size(this is why we release a family of models, so users can pick the size that fits their compute budget) andtoken sequence length. Compute costs scale quadratically with the token sequence length, so even small reductions can meaningfully cut the cost of running the model.

MACs, or multiply-accumulate operations, estimate the computation needed for one model forward pass; lower MACs generally mean cheaper, faster inference. The y-axis is inverted because lower average rank is better. Labels show model family and size. All plotted points use the pasted MAC/rank values.

https://huggingface.co/blog/allenai/olmoearth-v1-1#designing-the-tokenDesigning the token

This raises an important question for transformer-based remote sensing models:what should a token represent?

Take Sentinel-2 imagery, a common modality we process. A Sentinel-2 input will be some tensor with a height and width (H, W representing the latitudinal and longitudinal pixels), a temporal dimension T, and 12 Sentinel-2 channels ([H, W, T, D=12]).

Currently, we split the data into*resolution-based patches.*Concretely, this means that we will pick some spatial patch size p, and split our overall Sentinel-2 image into patches of size p x p:

For each patch, we create a token per timestep per resolution. So a Sentinel-2 input with 2 timesteps yields 6 tokens per patch (2 timesteps x 3 resolutions, 10m, 20m, and 60m).

In total, a[H, W, T, D=12] Sentinel-2 input will yield H/p x W/p x T x 3 tokens.

Using a unique token per resolution is a common technique when processing Sentinel-2 data—GalileoandSatMAEboth take this approach, and SatMAE shows significantly better results when doing it. However, it is not universal:CROMAis a model that only uses a single token for all bands, regardless of resolution. Because token counts compound multiplicatively, collapsing resolutions into a single token producesthree times fewer tokensand material savings across pretraining, fine-tuning, and inference.

Naively combining the tokens in this way leads to significant performance drops, including a 10 ppt drop on m-eurosat kNN (a common benchmark task for remote sensing models). We hypothesize that separating Sentinel-2 bands into different tokens makes it easier for OlmoEarth to model important cross-band relationships.

Merging tokenswithoutimpacting performance required us to modify our pre-training regimen. We describe those changes in detail in our paper.

https://huggingface.co/blog/allenai/olmoearth-v1-1#for-developersFor developers

The result is a model family that does more with less. At every size, OlmoEarth v1.1 runs up to three times cheaper than OlmoEarth v1, making frequent, planet-scale map refreshes more affordable for every team running OlmoEarth. If you’re using a model from the original OlmoEarth family, try OlmoEarth v1.1. It provides similar performance to OlmoEarth v1 while requiring one third of the compute, though we have seen some regressions (see our technical report for more details). If it works for your task, you should see a significant speedup during fine-tuning and inference.

https://huggingface.co/blog/allenai/olmoearth-v1-1#for-researchersFor researchers

Pretrained remote sensing models have many degrees of freedom, which makes them hard to study. When performance shifts, is it the architecture, the dataset, or the pre-training algorithm?

We train OlmoEarth v1.1 on the same dataset as OlmoEarth v1, so any differences between the two isolate the effect of methodological changes. We hope this advances understanding of scientific principles when pretraining models for remote sensing.

https://huggingface.co/blog/allenai/olmoearth-v1-1#get-startedGet started

Check out the OlmoEarth v1.1weightsandtraining code, including the weights for our Base, Tiny, and Nano models.

OlmoEarth v1.1: A more efficient family of models

OlmoEarth v1.1: A more efficient family of models

https://huggingface.co/blog/allenai/olmoearth-v1-1#increasing-efficiency-by-decreasing-sequence-lengthsIncreasing efficiency by decreasing sequence lengths

https://huggingface.co/blog/allenai/olmoearth-v1-1#designing-the-tokenDesigning the token

https://huggingface.co/blog/allenai/olmoearth-v1-1#for-developersFor developers

https://huggingface.co/blog/allenai/olmoearth-v1-1#for-researchersFor researchers

https://huggingface.co/blog/allenai/olmoearth-v1-1#get-startedGet started

Similar Articles

The OlmoEarth Platform: Geospatial inference at planetary scale

Olmo Hybrid: From Theory to Practice and Back

olmo-eval: An evaluation workbench for the model development loop

Oxlo.ai

TESSERA v2: Scaling Pixel-wise Earth Foundation Models

Submit Feedback

Similar Articles

The OlmoEarth Platform: Geospatial inference at planetary scale

Olmo Hybrid: From Theory to Practice and Back

olmo-eval: An evaluation workbench for the model development loop

TESSERA v2: Scaling Pixel-wise Earth Foundation Models