@HuggingPapers: Geometric Action Model for Robot Policy Learning Repurposes a geometric foundation model as one backbone for perception…
Summary
Geometric Action Model repurposes a geometric foundation model for robot policy learning, achieving 85.5% on LIBERO-Plus with 6.9 ms inference, 55× faster than baselines.
View Cached Full Text
Cached at: 06/17/26, 11:52 AM
Geometric Action Model for Robot Policy Learning
Repurposes a geometric foundation model as one backbone for perception, prediction, and action.
1.4B parameters. 6.9 ms inference. 85.5% on LIBERO-Plus. 55× faster than baselines. https://t.co/wNYlFaghX0
Similar Articles
Geometric Action Model for Robot Policy Learning
The Geometric Action Model (GAM) repurposes a pretrained geometric foundation model (GFM) as a unified backbone for language-conditioned robot manipulation, achieving higher accuracy, robustness, and efficiency than existing foundation-model-scale baselines across simulation and real-world benchmarks.
PoLAR: Factorizing Extent and Mode in Latent Actions for Robot Policy Learning
PoLAR introduces a geometrically structured latent action representation in hyperbolic space that separates transition extent from mode, improving robotic policy learning performance.
LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies
LaWAM enables efficient robot control by predicting compact latent visual subgoals instead of expensive video generation, achieving state-of-the-art success rates with up to 24x lower latency than pixel-space world action models.
@artemZholus: thanks! in the second paper (https://arxiv.org/abs/2605.06388) we used your (and RAE's) recipe and it worked.
This paper systematically compares reconstruction-based and semantic latent spaces for action-conditioned latent diffusion world models in robotics. It finds that semantic encoders like V-JEPA 2.1 generally outperform reconstruction encoders on policy-relevant metrics, advocating for semantic latent spaces as a stronger foundation for robotics world models.
Revisiting Articulated Parts Perception in Robot Manipulation
This paper introduces Geometric Primary Structure (GPS), a new representation for articulated parts perception in robot manipulation, enabling efficient VR-based annotation and achieving a 73% success rate without fine-tuning.