zai-org/SCAIL-2 · Hugging Face
Summary
SCAIL-2 is an open-source model for end-to-end controlled character animation that animates a reference character with a driving video, supporting character replacement and multi-character scenarios without intermediate pose representations.
View Cached Full Text
Cached at: 06/10/26, 12:21 AM
zai-org/SCAIL-2 · Hugging Face
Source: https://huggingface.co/zai-org/SCAIL-2
https://huggingface.co/zai-org/SCAIL-2#scail-2-unifying-controlled-character-animation-with-end-to-end-in-context-conditioningSCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning
SCAIL-2 is an open-source model forend-to-end controlled character animation. It animates a reference character with a driving video, and also supports character replacement and multi-character scenarios without relying on intermediate pose representations.

https://huggingface.co/zai-org/SCAIL-2#%F0%9F%94%8E-overview🔎 Overview
Prior approaches to character animation depend heavily on intermediate representations such as skeleton maps or inpainting masks. These intermediates are ambiguous under complex motion, restrict driving sources to human movements, and limit the reach of replacement and multi-character animation.
SCAIL-2 removes this dependence and achieveEnd-to-end Driving. Using several off-the-shelf models (SCAIL-Preview, Wan-Animate, MoCha), 60K motion pairs were synthesized and trained through a Unified Motion Transfer Interface with dedicated masking channels and RoPE design. The reverse driving training recipe with the unification lets the model learn capabilities beyond its teacher models, yielding emergent abilities such as:
- Cross-identity character replacement
- Animal-driving scenarios
- Zero-shot support for advanced control intermediates like SAM3D-Body mesh rendering

https://huggingface.co/zai-org/SCAIL-2#%F0%9F%93%A6-model📦 Model
ItemDetailResolutionsEnd-to-end driving supports both 512p and 704p; pose-driven and replacement performs better at 704pConstraintsH and W must both be divisible by 32 (e.g. 704×1280)TrainingMixed resolutions and fpsBundled modulesWan VAE and T5 are integrated into the checkpoint for convenience File layout after download:
SCAIL-2/
├── Wan2.1_VAE.pth
├── model
│ ├── 1
│ │ └── fsdp2_rank_0000_checkpoint.pt
│ └── latest
└── umt5-xxl
└── ...
https://huggingface.co/zai-org/SCAIL-2#%F0%9F%9A%80-usage🚀 Usage
Inference code, environment setup, and detailed instructions are provided in the project repository. Please refer to theProject Pageand the code repo for how to run the model.
https://huggingface.co/zai-org/SCAIL-2#%F0%9F%93%84-citation📄 Citation
@article{yan2025scail,
title={SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations},
author={Yan, Wenhao and Ye, Sheng and Yang, Zhuoyi and Teng, Jiayan and Dong, ZhenHui and Wen, Kairui and Gu, Xiaotao and Liu, Yong-Jin and Tang, Jie},
journal={arXiv preprint arXiv:2512.05905},
year={2025}
}
Similar Articles
SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning
SCAIL-2 is a framework that achieves end-to-end controlled character animation by directly transferring motion from driving videos without intermediate representations, using unified task decomposition, synthetic data (MotionPair-60K), and novel conditioning techniques like in-context mask conditioning and Bias-Aware DPO.
meituan-longcat/LongCat-Video-Avatar-1.5 · Hugging Face
LongCat-Video-Avatar 1.5 is an upgraded open-source framework for audio-driven human video generation with improved lip synchronization, production-ready stability, and efficient 8-step inference.
@Saboo_Shubham_: INSANE...this is an Open Source Video model available for free on Hugging Face. LongCat just dropped an amazing video a…
LongCat has released an open source video avatar model on Hugging Face that is free to use and capable of impressive feats.
Single Reference to a Fully Rigged 3D Character Using AI 3D Generation
A new AI model generates a fully rigged 3D character from a single reference image, streamlining 3D content creation.
@victormustar: New: LongCat just dropped an excellent open-source talking-avatar model (probably SOTA) + MIT licensed Made a Hugging F…
LongCat released an open-source talking-avatar model (likely state-of-the-art) under MIT license, with a Hugging Face demo, enabling various applications like AI tutors, dubbing, and coding agents.