OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data

Hugging Face Daily Papers 06/11/26, 12:00 AM Papers

Summary

A unified framework for camera motion cloning using grid motion videos and multimodal diffusion transformers, enabling director-level control without cross-paired data.

Cloning camera motion from reference videos is an important task in video generation, as videos provide intuitive and precise control. Existing methods either directly use parametric representations that fail to handle multi-shot generation or synthesize cross-paired data, which suffer from data scarcity, resulting in poor performance in complicated camera motion cloning. To address these issues, we introduce a general camera motion representation that encodes cameras as grid motion videos. This camera grid represents the camera parameters visually and supports the integration of diverse trajectories for multi-shot video generation. Building upon this, we propose OmniDirector, a unified framework trained on a million-scale camera grid-video pairs that coordinates characters, actions, and cameras to provide director-level control for multimodal diffusion transformers. Furthermore, we design a novel hierarchical prompt expansion agent that harmoniously integrates different control signals by systematically describing camera motion and visual content through understanding signal relationships. Extensive experiments demonstrate the superior performance and outstanding controllability of our framework. Project page: https://ymlinfeng.github.io/OmniDirector.github.io/

Original Article

View Cached Full Text

Cached at: 06/15/26, 09:04 AM

Paper page - OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data

Source: https://huggingface.co/papers/2606.13432

Abstract

A unified framework for camera motion cloning that uses grid motion videos as representation and integrates multimodal diffusion transformers for enhanced video generation control.

Cloning camera motion from reference videos is an important task invideo generation, as videos provide intuitive and precise control. Existing methods either directly useparametric representationsthat fail to handle multi-shot generation or synthesizecross-paired data, which suffer from data scarcity, resulting in poor performance in complicatedcamera motion cloning. To address these issues, we introduce a general camera motion representation that encodes cameras asgrid motion videos. Thiscamera gridrepresents thecamera parametersvisually and supports the integration of diverse trajectories for multi-shotvideo generation. Building upon this, we propose OmniDirector, a unified framework trained on a million-scalecamera grid-video pairs that coordinates characters, actions, and cameras to providedirector-level controlformultimodal diffusion transformers. Furthermore, we design a novelhierarchical prompt expansion agentthat harmoniously integrates different control signals by systematically describing camera motion and visual content through understanding signal relationships. Extensive experiments demonstrate the superior performance and outstanding controllability of our framework. Project page: https://ymlinfeng.github.io/OmniDirector.github.io/

View arXiv page View PDF Project page GitHub20 Add to collection

Get this paper in your agent:

hf papers read 2606\.13432

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.13432 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.13432 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.13432 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data

Paper page - OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation

Track2View: 4D-Consistent Camera-Controlled Video Generation via Paired 3D Point Tracks

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking

AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling

Submit Feedback

Similar Articles

OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation

Track2View: 4D-Consistent Camera-Controlled Video Generation via Paired 3D Point Tracks

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking

AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling