StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement
Summary
StressDream enhances video world models by steering diffusion-based imaginations toward high-impact yet plausible outcomes through optimized noise initialization with semantic and plausibility objectives, enabling robust policy evaluation and improvement.
View Cached Full Text
Cached at: 06/02/26, 03:37 PM
Paper page - StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement
Source: https://huggingface.co/papers/2606.00267
Abstract
StressDream enhances video world models by steering diffusion-based imaginations toward high-impact yet plausible outcomes through optimized noise initialization with semantic and plausibility objectives.
Video world models(WMs) have shown promise forpolicy evaluationand improvement by imagining realistic future observations conditioned on ego-robot actions. While WMs can model distributions over futures,policy evaluationand improvement typically rely on nominal imaginations, which can miss high-impact outcomes of robot actions unless prohibitively many samples are drawn. To enable robustpolicy evaluationand improvement over WM imaginations, we propose StressDream, which steers imaginations toward high-impact yet plausible outcomes specified at inference time by optimizing the initial noise of diffusion-based WMs. However, optimizing high-dimensional noise is challenging: the optimization must reason about nuanced, scene-dependent target events in generated videos while avoidingout-of-distribution(OOD) noise that yields implausible imaginations. We address this with two complementary objectives: asemantic objectivewith aVision-Language Modelthat provides informative gradients by reasoning about the generated video, and aplausibility objectivethat prevents the optimized noise from drifting OOD. With state-of-the-artvideo world modelsfor autonomous driving and robotic manipulation, we show that StressDream effectively steers imaginations toward high-impact yet plausible outcomes specified by text at inference time, such as task failures, enabling robustpolicy evaluationand improvement by identifying actions whose plausible futures include undesirable outcomes. Video results are available at https://junwon.me/StressDream/.
View arXiv pageView PDFProject pageGitHub5Add to collection
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.00267 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.00267 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.00267 in a Space README.md to link it from this page.
Collections including this paper1
Similar Articles
Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?
Dream.exe proposes an evaluation framework that uses robotic manipulation tasks to assess video generation models' understanding of physical reality, finding that visual quality does not predict executable motion accuracy.
WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors
This paper introduces WorldReasonBench and WorldRewardBench, new benchmarks designed to evaluate video generation models' ability to reason about world-state evolution and physical consistency. The research highlights a gap between visual plausibility and true logical reasoning in current commercial video generators.
Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models
This paper introduces a novel adaptive scheduler for steering discrete diffusion language models using sparse autoencoders, demonstrating that targeting interventions based on when specific attributes commit improves control quality and strength over uniform methods.
Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization
Proposes Model-Based Diffusion Policy Optimization (MBDPO), a framework that unifies search and policy optimization in world models using diffusion policy representations, achieving consistent scaling behavior and superior performance across offline and online reinforcement learning tasks.
NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation
NVIDIA presents OmniDreams, a generative world model built from the Cosmos diffusion model for real-time action-conditioned video generation, enabling closed-loop simulation for autonomous driving policy evaluation in complex unseen scenarios.