BadWorld: Adversarial Attacks on World Models

Hugging Face Daily Papers 06/15/26, 12:00 AM Papers

adversarial-attacks world-models robustness perturbations safety-critical self-supervised

Summary

BadWorld is a label-free adversarial framework that reveals structural vulnerabilities in visual world models by generating imperceptible perturbations that cause catastrophic failures in future rollouts.

Visual world models (VWMs) synthesize interactive, action-conditioned rollouts from a single context image. However, it remains an open question how robust these models are to adversarial perturbations. Standard adversarial attacks fail to assess this vulnerability because attackers lack ground-truth future videos and cannot predict subsequent user controls. We introduce BadWorld, a label-free adversarial framework tailored for autoregressive VWMs that systematically overcomes both constraints. First, to bypass the need for future supervision, we propose a self-supervised velocity attack that directly disrupts the early denoising dynamics of the model. Second, to ensure the attack generalizes across unpredictable user actions, we formulate a trajectory-adaptive bi-level optimization that actively mines hard control sequences to forge control-agnostic perturbations. Evaluated on representative VWMs with continuous and discrete controls, BadWorld exposes severe structural fragility. Visually indistinguishable adversarial images reliably trigger catastrophic degradation in future rollouts, leading to incomplete denoising, structural collapse, and control inconsistency. These findings reveal critical risks for deploying VWMs in safety-critical systems while highlighting a practical mechanism for privacy protection.

Original Article

View Cached Full Text

Cached at: 06/16/26, 11:33 AM

Paper page - BadWorld: Adversarial Attacks on World Models

Source: https://huggingface.co/papers/2606.16519

Abstract

Visual world models(VWMs) synthesize interactive, action-conditioned rollouts from a single context image. However, it remains an open question how robust these models are toadversarial perturbations. Standard adversarial attacks fail to assess this vulnerability because attackers lack ground-truth future videos and cannot predict subsequent user controls. We introduce BadWorld, a label-free adversarial framework tailored for autoregressive VWMs that systematically overcomes both constraints. First, to bypass the need for future supervision, we propose aself-supervised velocity attackthat directly disrupts the earlydenoising dynamicsof the model. Second, to ensure the attack generalizes across unpredictable user actions, we formulate atrajectory-adaptive bi-level optimizationthat actively mines hard control sequences to forgecontrol-agnostic perturbations. Evaluated on representative VWMs with continuous and discrete controls, BadWorld exposes severestructural fragility. Visually indistinguishable adversarial images reliably trigger catastrophic degradation infuture rollouts, leading to incomplete denoising, structural collapse, and control inconsistency. These findings reveal critical risks for deploying VWMs in safety-critical systems while highlighting a practical mechanism for privacy protection.

View arXiv page View PDF Project page GitHub3 Add to collection

Get this paper in your agent:

hf papers read 2606\.16519

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.16519 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.16519 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.16519 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

BadWorld: Adversarial Attacks on World Models

Paper page - BadWorld: Adversarial Attacks on World Models

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Baba in Wonderland: Online Self-Supervised Dynamics Discovery for Executable World Models

stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation

When Background Matters: Breaking Medical Vision Language Models by Transferable Attack

Adversarial attacks on neural network policies

ActWorld: From Explorable to Interactive World Model via Action-Aware Memory

Submit Feedback

Similar Articles

Baba in Wonderland: Online Self-Supervised Dynamics Discovery for Executable World Models

stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation

When Background Matters: Breaking Medical Vision Language Models by Transferable Attack

Adversarial attacks on neural network policies

ActWorld: From Explorable to Interactive World Model via Action-Aware Memory