World Model for Robot Learning: A Comprehensive Survey
Summary
This comprehensive survey reviews the literature on world models for robot learning, covering their roles in policy learning, planning, and simulation. It highlights key paradigms, benchmarks, and future directions for predictive modeling in embodied agents.
View Cached Full Text
Cached at: 05/13/26, 04:10 AM
Paper page - World Model for Robot Learning: A Comprehensive Survey
Source: https://huggingface.co/papers/2605.00080 Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
World models as predictive representations of environmental dynamics have become essential for robot learning, supporting policy learning, planning, and simulation across various embodied applications.
World models, which arepredictive representationsof how environments evolve under actions, have become a central component ofrobot learning. They supportpolicy learning,planning,simulation,evaluation, data generation, and have advanced rapidly with the rise of foundation models and large-scalevideo generation. However, the literature remains fragmented across architectures, functional roles, and embodied application domains. To address this gap, we present a comprehensive review ofworld modelsfrom a robot-learning perspective. We examine howworld modelsare coupled with robot policies, how they serve as learned simulators forreinforcement learningandevaluation, and how robotic videoworld modelshave progressed from imagination-based generation to controllable, structured, and foundation-scale formulations. We further connect these ideas to navigation and autonomous driving, and summarize representative datasets, benchmarks, andevaluationprotocols. Overall, this survey systematically reviews the rapidly growing literature onworld modelsforrobot learning, clarifies key paradigms and applications, and highlights major challenges and future directions for predictive modeling inembodied agents. To facilitate continued access to newly emerging works, benchmarks, and resources, we will maintain and regularly update the accompanying GitHub repository alongside this survey.
View arXiv pageView PDFProject pageGitHubAdd to collection
Get this paper in your agent:
hf papers read 2605\.00080
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.00080 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.00080 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.00080 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
World Action Models: The Next Frontier in Embodied AI
This survey paper introduces World Action Models (WAMs), a unified framework for embodied AI that integrates predictive state modeling with action generation. It provides a taxonomy of existing methods, analyzes the data ecosystem, and outlines evaluation protocols for this emerging paradigm.
A Case for Robot Learning
The article discusses the challenges of programming robots due to Moravec's paradox and proposes robot learning as a solution to enable embodied intelligence.
Learning Visual Feature-Based World Models via Residual Latent Action
This paper introduces RLA-WM, a visual feature-based world model that leverages residual latent actions and flow matching to efficiently predict future visual states. The method outperforms existing video-diffusion and feature-based approaches while enabling novel robot learning techniques from offline, actionless demonstration videos.
World Models Can Change Everything (20 minute read)
The article discusses the potential paradigm-shifting impact of world models on AI, highlighting investments by Yann LeCun and Fei-Fei Li in this technology as a successor to the current LLM paradigm.
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
Agent-World introduces a self-evolving training framework for general agent intelligence that autonomously discovers real-world environments and tasks via the Model Context Protocol, enabling continuous learning. Agent-World-8B and 14B models outperform strong proprietary models across 23 challenging agent benchmarks.