Tag
LooseControlVideo introduces a framework for intuitive 3D spatial control in text-to-video generation using sparse oriented 3D boxes as proxies, achieving superior trajectory accuracy and occlusion handling. It fine-tunes a Wan 2.2 backbone and demonstrates significant improvements over existing methods on multiple benchmarks.