Holo-World: Unified Camera, Object and Weather Control for Video World Model

Hugging Face Daily Papers Papers

Summary

Holo-World presents a unified controllable video world model that generates videos from a single image with explicit control over camera, object motion, and weather. It introduces a novel dataset and techniques to preserve scene structure while transferring to target weather states.

Video world models are moving toward preserving an observed world under controllable camera and object motion while allowing its environmental state to change. Yet these controls remain isolated, and weather generation typically relies on a source video or reconstructed scene that already specifies future structure. We study a first-frame-anchored source-to-state setting, where the model starts from a single image and follows explicit camera and object controls and an optional weather instruction, then generates a video that either preserves the source world or transfers it to a target weather state. To address these challenges, we first build HoloStateData, a state video dataset that turns diverse videos into unified control samples for camera, object, and weather supervision. Second, we introduce Holo-World, a unified controllable video world model that jointly controls scene from a single image. Its Unified Scene Adapter factorizes world preservation and weather transfer into distinct parameter subspaces, using rendered background, geometry buffers, and object controls to maintain controlled scene structure while modeling weather-dependent appearance and particle effects. Additionally, Scene-Weather Decomposed CFG guides scene and weather residuals separately, strengthening target weather effects without over-amplifying the full condition. Quantitative and qualitative experiments demonstrate that Holo-World maintains precise camera and object control with consistent scene structure while transferring scenes into diverse target weather state, outperforming video-to-video weather editing baselines on weather-state generation. Our project page is available at https://xiangchenyin.github.io/Holo-World/.
Original Article
View Cached Full Text

Cached at: 06/20/26, 02:29 PM

Paper page - Holo-World: Unified Camera, Object and Weather Control for Video World Model

Source: https://huggingface.co/papers/2606.20083

Abstract

A unified controllable video world model generates videos from a single image while preserving scene structure and transferring to target weather states through specialized parameterization and conditioning techniques.

Video world modelsare moving toward preserving an observed world under controllable camera andobject motionwhile allowing its environmental state to change. Yet these controls remain isolated, andweather generationtypically relies on a source video or reconstructed scene that already specifies future structure. We study a first-frame-anchoredsource-to-state setting, where the model starts from a single image and follows explicit camera and object controls and an optional weather instruction, then generates a video that either preserves the source world or transfers it to a target weather state. To address these challenges, we first buildHoloStateData, a state video dataset that turns diverse videos into unified control samples for camera, object, and weather supervision. Second, we introduceHolo-World, a unified controllable video world model that jointly controls scene from a single image. ItsUnified Scene Adapterfactorizes world preservation andweather transferinto distinct parameter subspaces, usingrendered background,geometry buffers, and object controls to maintain controlled scene structure while modeling weather-dependent appearance and particle effects. Additionally,Scene-Weather Decomposed CFGguides scene and weather residuals separately, strengthening target weather effects without over-amplifying the full condition. Quantitative and qualitative experiments demonstrate thatHolo-Worldmaintains precise camera and object control with consistent scene structure while transferring scenes into diverse target weather state, outperformingvideo-to-video weather editingbaselines on weather-state generation. Our project page is available at https://xiangchenyin.github.io/Holo-World/.

View arXiv pageView PDFProject pageGitHub4Add to collection

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.20083 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.20083 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.20083 in a Space README.md to link it from this page.

Collections including this paper1

Similar Articles

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

Hugging Face Daily Papers

MultiWorld is a unified framework for multi-agent multi-view video world modeling that achieves accurate control of multiple agents while maintaining multi-view consistency through a Multi-Agent Condition Module and Global State Encoder.