Tag
This paper introduces World-Language-Action (WLA) models, embodied foundation models that jointly predict textual subtasks, subgoal images, and robot actions from text, images, and robot states, achieving state-of-the-art multi-task and long-horizon learning in simulated and real-world environments.
AllenAI has released open-source MolmoAct2 models for robot control, with multiple fine-tuned versions for different tasks, including full datasets and training code.
Google DeepMind partnered with Boston Dynamics to integrate Gemini Robotics embodied reasoning models into their Spot robot, enabling improved environmental understanding, object identification, and command following for tasks like tidying rooms.