@HuggingPapers: Geometric Action Model for Robot Policy Learning Repurposes a geometric foundation model as one backbone for perception…

X AI KOLs Following Papers

Summary

Geometric Action Model repurposes a geometric foundation model for robot policy learning, achieving 85.5% on LIBERO-Plus with 6.9 ms inference, 55× faster than baselines.

Geometric Action Model for Robot Policy Learning Repurposes a geometric foundation model as one backbone for perception, prediction, and action. 1.4B parameters. 6.9 ms inference. 85.5% on LIBERO-Plus. 55× faster than baselines. https://t.co/wNYlFaghX0
Original Article
View Cached Full Text

Cached at: 06/17/26, 11:52 AM

Geometric Action Model for Robot Policy Learning

Repurposes a geometric foundation model as one backbone for perception, prediction, and action.

1.4B parameters. 6.9 ms inference. 85.5% on LIBERO-Plus. 55× faster than baselines. https://t.co/wNYlFaghX0

Similar Articles

Geometric Action Model for Robot Policy Learning

Hugging Face Daily Papers

The Geometric Action Model (GAM) repurposes a pretrained geometric foundation model (GFM) as a unified backbone for language-conditioned robot manipulation, achieving higher accuracy, robustness, and efficiency than existing foundation-model-scale baselines across simulation and real-world benchmarks.

@artemZholus: thanks! in the second paper (https://arxiv.org/abs/2605.06388) we used your (and RAE's) recipe and it worked.

X AI KOLs Following

This paper systematically compares reconstruction-based and semantic latent spaces for action-conditioned latent diffusion world models in robotics. It finds that semantic encoders like V-JEPA 2.1 generally outperform reconstruction encoders on policy-relevant metrics, advocating for semantic latent spaces as a stronger foundation for robotics world models.

Revisiting Articulated Parts Perception in Robot Manipulation

Hugging Face Daily Papers

This paper introduces Geometric Primary Structure (GPS), a new representation for articulated parts perception in robot manipulation, enabling efficient VR-based annotation and achieving a 73% success rate without fine-tuning.