Tag
CoInteract introduces an end-to-end Diffusion Transformer framework that jointly models RGB appearance and HOI geometry to generate physically-plausible human-object interaction videos with stable hands/faces and zero inference overhead.
Overworld releases Waypoint-1.5, a real-time video world model designed for everyday GPUs, featuring improved visual fidelity and new 360p and 720p tiers for broader hardware accessibility.
Lyra 2.0 is NVIDIA's framework for generating persistent, explorable 3D worlds from a single image, combining long-range video synthesis with explicit 3D reconstruction while addressing spatial forgetting and temporal drifting through novel training techniques.
Google DeepMind's Project Genie is a unified world model that generates and interacts with diverse video games by treating them as conditional video prediction tasks.