Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Hugging Face Daily Papers 06/09/26, 12:00 AM Papers

embodied foundation-model reinforcement-learning robotics vision-language benchmark open-source

Summary

Embodied-R1.5 is a unified embodied foundation model that achieves state-of-the-art performance on 16 out of 24 embodied vision-language benchmarks using multi-task balanced reinforcement learning. It introduces a Planner-Grounder-Corrector closed-loop framework for long-horizon tasks and is open-sourced to facilitate future research.

We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embodied cognition, task planning, correction, and pointing, within a single architecture toward general physical intelligence. Leveraging three automated data construction pipelines to significantly expand the data coverage of critical capabilities, we build a large-scale data system of over 15B tokens, and design a multi-task balanced RL recipe to alleviate heterogeneous task conflicts. We further introduce a Planner-Grounder-Corrector (PGC) closed-loop framework that enables a single model to autonomously execute and self-correct over long-horizon tasks. With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4. Benefiting from the internalized embodied capabilities, Embodied-R1.5 can be fine-tuned into a VLA with only a small amount of data, outperforming leading VLA models like π_{0.5} across 4 popular manipulation benchmark suites. We further conduct extensive zero-shot real-robot experiments, validating performance in instruction following, affordance grounding, articulated object manipulation, and long-horizon complex tasks, demonstrating strong generalization to the physical world. We open-source model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks, to facilitate future research in EFMs.

Original Article

View Cached Full Text

Cached at: 06/11/26, 01:40 PM

Paper page - Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Source: https://huggingface.co/papers/2606.11324 Authors:

Abstract

Embodied-R1.5 is a unified embodied foundation model that integrates embodied reasoning capabilities and achieves state-of-the-art performance on embodied vision-language benchmarks through a multi-task balanced reinforcement learning approach.

We introduce Embodied-R1.5, a unifiedEmbodied Foundation Model(EFM) that integrates comprehensive embodied reasoning capabilities, spanningembodied cognition,task planning,correction, andpointing, within a single architecture toward general physical intelligence. Leveraging three automateddata construction pipelinesto significantly expand the data coverage of critical capabilities, we build a large-scale data system of over 15B tokens, and design amulti-task balanced RLrecipe to alleviate heterogeneous task conflicts. We further introduce a Planner-Grounder-Corrector (PGC) closed-loop framework that enables a single model to autonomously execute and self-correct over long-horizon tasks. With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4. Benefiting from the internalized embodied capabilities, Embodied-R1.5 can be fine-tuned into aVLAwith only a small amount of data, outperforming leadingVLAmodels like π_{0.5} across 4 popular manipulation benchmark suites. We further conduct extensive zero-shot real-robot experiments, validating performance in instruction following, affordance grounding, articulated object manipulation, and long-horizon complex tasks, demonstrating strong generalization to the physical world. We open-source model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks, to facilitate future research in EFMs.

View arXiv page View PDF Project page GitHub17 Add to collection

Get this paper in your agent:

hf papers read 2606\.11324

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.11324 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.11324 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.11324 in a Space README.md to link it from this page.

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Paper page - Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper1

Similar Articles

tencent/HY-Embodied-0.5

PhysBrain 1.0 Technical Report

Embodied-BenchClaw: An Autonomous Multi-Agent System for Embodied Spatial Intelligence Benchmark Construction

EasyVideoR1: Easier RL for Video Understanding

ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

Submit Feedback

Similar Articles

PhysBrain 1.0 Technical Report

Embodied-BenchClaw: An Autonomous Multi-Agent System for Embodied Spatial Intelligence Benchmark Construction

EasyVideoR1: Easier RL for Video Understanding

ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop