Learning Transferable Dynamics Priors from Action to World Modeling
Summary
This paper introduces A2World, a diffusion-based world model pretrained on large-scale robot manipulation data to learn transferable dynamics priors. The model can be adapted into a real-world simulator (A2World-sim) for policy evaluation or a video-action prediction model (A2World-policy) for action prediction, demonstrating benefits for both simulator-centric and policy-centric robot learning.
View Cached Full Text
Cached at: 06/30/26, 07:34 AM
Paper page - Learning Transferable Dynamics Priors from Action to World Modeling
Source: https://huggingface.co/papers/2606.29501
Abstract
Action-conditioned world modeling enables transferable dynamics priors for robot learning through pretraining on large-scale manipulation data, supporting both simulator-based policy evaluation and video-action prediction.
We studyaction-conditionedworld modelingas a scalable way to learn transferabledynamics priorsfor robot learning. Bypretraininga model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pretrain amulti-view interactivebasediffusion world model, A2World, on large-scalerobot manipulationdata with real action annotations. We validate the learneddynamics priorsfrom two complementary perspectives. First, we adapt A2World into a task- or scene-specialized real-world simulator, A2World-sim, whose long-horizon rollouts support simulator-based policy evaluation and scalable what-if analysis by replacing real-robot rollouts with world model rollouts. Second, starting from the same pretrained weights, we adapt A2World into avideo-action joint predictionmodel, A2World-policy, that predicts actions under visual and instruction conditioning. Experiments across simulation benchmarks and real-robot settings demonstrate thataction-conditionedworld modelpretrainingyields transferabledynamics priorsthat benefit both simulator-centric and policy-centric robot learning.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2606\.29501
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.29501 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.29501 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.29501 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Transfer from simulation to real world through learning deep inverse dynamics model
This paper proposes a method to bridge the simulation-to-real-world gap in robotics by learning a deep inverse dynamics model that maps desired next states (from simulation) to appropriate real-world actions. The approach is evaluated against baselines like output error control and Gaussian dynamics adaptation.
World Pilot: Steering Vision-Language-Action Models with World-Action Priors
World Pilot enhances Vision-Language-Action models by incorporating dynamic scene evolution and trajectory priors from a World-Action Model, achieving state-of-the-art zero-shot performance on manipulation tasks.
LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies
LaWAM enables efficient robot control by predicting compact latent visual subgoals instead of expensive video generation, achieving state-of-the-art success rates with up to 24x lower latency than pixel-space world action models.
The DAWN of World-Action Interactive Models
This paper introduces DAWN, a latent generative baseline for World-Action Interactive Models (WAIMs) that jointly models scene evolution and action generation through recursive refinement, achieving strong long-horizon planning in autonomous driving scenarios.
Beyond Next-Observation Prediction: Agent-Authored World Modeling for Sequential Decision Making
This paper introduces Agent-Authored World Modeling (AAWM), a training procedure that constructs world-model supervision based on the policy's own decision needs rather than next-observation prediction, aligning the learning objective with the dynamics required for effective decision-making.