StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement

Hugging Face Daily Papers 05/29/26, 12:00 AM Papers

video-world-models diffusion policy-evaluation robotics autonomous-driving noise-optimization

Summary

StressDream enhances video world models by steering diffusion-based imaginations toward high-impact yet plausible outcomes through optimized noise initialization with semantic and plausibility objectives, enabling robust policy evaluation and improvement.

Video world models (WMs) have shown promise for policy evaluation and improvement by imagining realistic future observations conditioned on ego-robot actions. While WMs can model distributions over futures, policy evaluation and improvement typically rely on nominal imaginations, which can miss high-impact outcomes of robot actions unless prohibitively many samples are drawn. To enable robust policy evaluation and improvement over WM imaginations, we propose StressDream, which steers imaginations toward high-impact yet plausible outcomes specified at inference time by optimizing the initial noise of diffusion-based WMs. However, optimizing high-dimensional noise is challenging: the optimization must reason about nuanced, scene-dependent target events in generated videos while avoiding out-of-distribution (OOD) noise that yields implausible imaginations. We address this with two complementary objectives: a semantic objective with a Vision-Language Model that provides informative gradients by reasoning about the generated video, and a plausibility objective that prevents the optimized noise from drifting OOD. With state-of-the-art video world models for autonomous driving and robotic manipulation, we show that StressDream effectively steers imaginations toward high-impact yet plausible outcomes specified by text at inference time, such as task failures, enabling robust policy evaluation and improvement by identifying actions whose plausible futures include undesirable outcomes. Video results are available at https://junwon.me/StressDream/.

Original Article

View Cached Full Text

Cached at: 06/02/26, 03:37 PM

Paper page - StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement

Source: https://huggingface.co/papers/2606.00267

Abstract

Video world models(WMs) have shown promise forpolicy evaluationand improvement by imagining realistic future observations conditioned on ego-robot actions. While WMs can model distributions over futures,policy evaluationand improvement typically rely on nominal imaginations, which can miss high-impact outcomes of robot actions unless prohibitively many samples are drawn. To enable robustpolicy evaluationand improvement over WM imaginations, we propose StressDream, which steers imaginations toward high-impact yet plausible outcomes specified at inference time by optimizing the initial noise of diffusion-based WMs. However, optimizing high-dimensional noise is challenging: the optimization must reason about nuanced, scene-dependent target events in generated videos while avoidingout-of-distribution(OOD) noise that yields implausible imaginations. We address this with two complementary objectives: asemantic objectivewith aVision-Language Modelthat provides informative gradients by reasoning about the generated video, and aplausibility objectivethat prevents the optimized noise from drifting OOD. With state-of-the-artvideo world modelsfor autonomous driving and robotic manipulation, we show that StressDream effectively steers imaginations toward high-impact yet plausible outcomes specified by text at inference time, such as task failures, enabling robustpolicy evaluationand improvement by identifying actions whose plausible futures include undesirable outcomes. Video results are available at https://junwon.me/StressDream/.

View arXiv page View PDF Project page GitHub5 Add to collection

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.00267 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.00267 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.00267 in a Space README.md to link it from this page.

StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement

Paper page - StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper1

Similar Articles

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation

Submit Feedback

Similar Articles

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation