Tag
minWM is a full-stack open-source framework that converts bidirectional video diffusion models into real-time interactive video world models with controllable camera, low-latency rollout, and modular architecture.
Geo-Align presents a reinforcement learning framework for camera-controlled video re-rendering that improves generalization through scale-aware perceptual rewards and metric 3D estimation for camera trajectory extraction.
SANA-WM is a 2.6B-parameter open-source world model that generates high-fidelity 720p minute-scale videos with precise camera control, achieving industrial-level quality while significantly reducing computational requirements.