Tag
Incantation presents an interactive video world model that uses natural language as the action interface for fine-grained multi-entity control and cross-entity generalization, achieving high performance and real-time streaming through novel attention and distillation techniques.