BadWorld: Adversarial Attacks on World Models
Summary
BadWorld is a label-free adversarial framework that reveals structural vulnerabilities in visual world models by generating imperceptible perturbations that cause catastrophic failures in future rollouts.
View Cached Full Text
Cached at: 06/16/26, 11:33 AM
Paper page - BadWorld: Adversarial Attacks on World Models
Source: https://huggingface.co/papers/2606.16519
Abstract
BadWorld is a label-free adversarial framework that reveals structural vulnerabilities in visual world models by generating imperceptible perturbations that cause catastrophic failures in future rollouts.
Visual world models(VWMs) synthesize interactive, action-conditioned rollouts from a single context image. However, it remains an open question how robust these models are toadversarial perturbations. Standard adversarial attacks fail to assess this vulnerability because attackers lack ground-truth future videos and cannot predict subsequent user controls. We introduce BadWorld, a label-free adversarial framework tailored for autoregressive VWMs that systematically overcomes both constraints. First, to bypass the need for future supervision, we propose aself-supervised velocity attackthat directly disrupts the earlydenoising dynamicsof the model. Second, to ensure the attack generalizes across unpredictable user actions, we formulate atrajectory-adaptive bi-level optimizationthat actively mines hard control sequences to forgecontrol-agnostic perturbations. Evaluated on representative VWMs with continuous and discrete controls, BadWorld exposes severestructural fragility. Visually indistinguishable adversarial images reliably trigger catastrophic degradation infuture rollouts, leading to incomplete denoising, structural collapse, and control inconsistency. These findings reveal critical risks for deploying VWMs in safety-critical systems while highlighting a practical mechanism for privacy protection.
View arXiv pageView PDFProject pageGitHub3Add to collection
Get this paper in your agent:
hf papers read 2606\.16519
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.16519 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.16519 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.16519 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Baba in Wonderland: Online Self-Supervised Dynamics Discovery for Executable World Models
Introduces Alice, a closed-loop system that learns executable world models online under prior misalignment by treating failed candidate updates as structural signal, achieving improved performance on a variant of Baba Is You with semantically remapped labels.
stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation
Stable-Worldmodel (SWM) is a modular and standardized research framework for developing and evaluating world models, designed to improve reproducibility and support robustness and continual learning research.
When Background Matters: Breaking Medical Vision Language Models by Transferable Attack
MedFocusLeak introduces the first transferable black-box adversarial attack on medical vision-language models, using imperceptible background perturbations to mislead clinical diagnoses across six imaging modalities.
Adversarial attacks on neural network policies
OpenAI researchers demonstrate that adversarial attacks, previously studied in computer vision, are also effective against neural network policies in reinforcement learning, showing significant performance degradation even with small imperceptible perturbations in white-box and black-box settings.
ActWorld: From Explorable to Interactive World Model via Action-Aware Memory
ActWorld proposes a chunk-autoregressive world model with hierarchical action-aware memory to support object interaction alongside navigation, addressing data and memory bottlenecks in existing interactive world models.