Tag
The Geometric Action Model (GAM) repurposes a pretrained geometric foundation model (GFM) as a unified backbone for language-conditioned robot manipulation, achieving higher accuracy, robustness, and efficiency than existing foundation-model-scale baselines across simulation and real-world benchmarks.
SpatialClaw is a training-free framework that uses code as an action interface to enable flexible, stateful spatial reasoning in vision-language models, achieving superior performance across diverse 3D/4D spatial reasoning tasks.
MolmoAct 2 is an open robotics model that reasons in 3D space before taking actions, developed by the Allen Institute for Artificial Intelligence.