NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation
Summary
NVIDIA presents OmniDreams, a generative world model built from the Cosmos diffusion model for real-time action-conditioned video generation, enabling closed-loop simulation for autonomous driving policy evaluation in complex unseen scenarios.
View Cached Full Text
Cached at: 06/03/26, 03:35 AM
Paper page - NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation
Source: https://huggingface.co/papers/2606.03159 Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
OmniDreams, a foundation generative world model trained from the Cosmos diffusion model, enables real-time action-conditioned video generation for autonomous driving policy evaluation in complex, unseen scenarios.
As autonomous vehicle capabilities advance, the safe evaluation of driving policies in long-tail scenarios remains a critical bottleneck. Inclosed-loop simulation, the drivingpolicy modelactively interacts with the environment, where its actions dynamically update the simulator state and directly influence the next set of generated sensor observations. While recent reconstruction-basedneural simulators offer photorealism, they are fundamentally constrained by their initial captured data and struggle to generalize to highly dynamic or novel scenes. To overcome these limitations, we introduce OmniDreams, a foundationgenerative world modelmid- and post-trained from the Cosmosdiffusion modelto autoregressively generateaction-conditioned videos in real time. By leveraging the rich visual priors of Cosmos and mid- and post-training on 21k hours of driving scenarios, OmniDreams synthesizes complex, unobserved phenomena that are hard for traditional simulators to capture, such as extreme weather and unpredictable dynamic agent behaviors. Crucially, it autoregressively conditions itsphotorealistic sensor generationon past frames, the current simulator state, and immediate driving actions. Deployed in a closed-loop system with the Alpamayo 1policy modeland AlpaSim orchestrator, OmniDreams acts as a highly responsive, reactive environment, providing a scalable and comprehensive solution for training and evaluating next-generation autonomous driving policies. We additionally show preliminary results indicating that aworld-action model(WAM) post-trained from OmniDreams achieves strong performance on the Physical AI Autonomous Vehicles NuRec dataset, surpassing the VLA-based Alpamayo 1.5 researchpolicy modelwhile using only 1/5 the total parameters. These results highlight the potential for a real-time world model like OmniDreams to also serve as a backbone for policy architectures.
View arXiv pageView PDFAdd to collection
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.03159 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.03159 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.03159 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action
NVIDIA Cosmos 3 is an open omni-model for physical AI that unifies world generation, reasoning, and action generation into a single model, available on Hugging Face with various resources.
nvidia/Cosmos3-Nano
NVIDIA releases Cosmos3-Nano, an omnimodal world model for Physical AI that generates video, image, audio, and action commands from text, image, video, and action inputs, targeting robotics, autonomous driving, and smart space applications.
Decart’s new world model can simulate hours of photorealistic driving — with some caveats
Decart unveils Oasis 3, an interactive world model that generates photorealistic driving environments in real time, available via API. Targeting autonomous vehicle simulation and other physical AI applications, the model leverages Decart's optimization stack for cost efficiency.
Nvidia Cosmos 3
NVIDIA has open-sourced Cosmos 3, a frontier foundation model for physical AI that unifies reasoning, world generation, and action generation within a single Mixture-of-Transformers architecture, releasing model checkpoints, datasets, and training scripts for robotics, autonomous vehicles, and warehouse monitoring.
nvidia/Cosmos3-Super
NVIDIA released Cosmos3, a collection of omnimodal world foundation models for Physical AI, capable of generating video, image, audio, and action commands from various inputs, with versions for different tasks like policy learning and image-to-video generation.