zai-org/SCAIL-2 · Hugging Face

Reddit r/LocalLLaMA Models

Summary

SCAIL-2 is an open-source model for end-to-end controlled character animation that animates a reference character with a driving video, supporting character replacement and multi-character scenarios without intermediate pose representations.

# SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning SCAIL-2 is an open-source model for **end-to-end controlled character animation**. It animates a reference character with a driving video, and also supports character replacement and multi-character scenarios without relying on intermediate pose representations. Overview Prior approaches to character animation depend heavily on intermediate representations such as skeleton maps or inpainting masks. These intermediates are ambiguous under complex motion, restrict driving sources to human movements, and limit the reach of replacement and multi-character animation. SCAIL-2 removes this dependence and achieve **End-to-end Driving**. Using several off-the-shelf models (SCAIL-Preview, Wan-Animate, MoCha), 60K motion pairs were synthesized and trained through a Unified Motion Transfer Interface with dedicated masking channels and RoPE design. The reverse driving training recipe with the unification lets the model learn capabilities beyond its teacher models, yielding emergent abilities such as: * Cross-identity character replacement * Animal-driving scenarios * Zero-shot support for advanced control intermediates like SAM3D-Body mesh rendering
Original Article
View Cached Full Text

Cached at: 06/10/26, 12:21 AM

zai-org/SCAIL-2 · Hugging Face

Source: https://huggingface.co/zai-org/SCAIL-2

https://huggingface.co/zai-org/SCAIL-2#scail-2-unifying-controlled-character-animation-with-end-to-end-in-context-conditioningSCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning

SCAIL-2 is an open-source model forend-to-end controlled character animation. It animates a reference character with a driving video, and also supports character replacement and multi-character scenarios without relying on intermediate pose representations.

Teaser

https://huggingface.co/zai-org/SCAIL-2#%F0%9F%94%8E-overview🔎 Overview

Prior approaches to character animation depend heavily on intermediate representations such as skeleton maps or inpainting masks. These intermediates are ambiguous under complex motion, restrict driving sources to human movements, and limit the reach of replacement and multi-character animation.

SCAIL-2 removes this dependence and achieveEnd-to-end Driving. Using several off-the-shelf models (SCAIL-Preview, Wan-Animate, MoCha), 60K motion pairs were synthesized and trained through a Unified Motion Transfer Interface with dedicated masking channels and RoPE design. The reverse driving training recipe with the unification lets the model learn capabilities beyond its teacher models, yielding emergent abilities such as:

  • Cross-identity character replacement
  • Animal-driving scenarios
  • Zero-shot support for advanced control intermediates like SAM3D-Body mesh rendering

pipeline

https://huggingface.co/zai-org/SCAIL-2#%F0%9F%93%A6-model📦 Model

ItemDetailResolutionsEnd-to-end driving supports both 512p and 704p; pose-driven and replacement performs better at 704pConstraintsH and W must both be divisible by 32 (e.g. 704×1280)TrainingMixed resolutions and fpsBundled modulesWan VAE and T5 are integrated into the checkpoint for convenience File layout after download:

SCAIL-2/
├── Wan2.1_VAE.pth
├── model
│   ├── 1
│   │   └── fsdp2_rank_0000_checkpoint.pt
│   └── latest
└── umt5-xxl
    └── ...

https://huggingface.co/zai-org/SCAIL-2#%F0%9F%9A%80-usage🚀 Usage

Inference code, environment setup, and detailed instructions are provided in the project repository. Please refer to theProject Pageand the code repo for how to run the model.

https://huggingface.co/zai-org/SCAIL-2#%F0%9F%93%84-citation📄 Citation

@article{yan2025scail,
  title={SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations},
  author={Yan, Wenhao and Ye, Sheng and Yang, Zhuoyi and Teng, Jiayan and Dong, ZhenHui and Wen, Kairui and Gu, Xiaotao and Liu, Yong-Jin and Tang, Jie},
  journal={arXiv preprint arXiv:2512.05905},
  year={2025}
}

Similar Articles

SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning

Hugging Face Daily Papers

SCAIL-2 is a framework that achieves end-to-end controlled character animation by directly transferring motion from driving videos without intermediate representations, using unified task decomposition, synthetic data (MotionPair-60K), and novel conditioning techniques like in-context mask conditioning and Bias-Aware DPO.