Tag
In his talk at Sequoia AI Ascent, Dr. Jim Fan presents a roadmap for achieving Physical AGI parallel to LLM success, introducing concepts like video world models, World Action Models (WAM), and the Dexterity Scaling Law, and sharing predictions for the near future.
The article discusses the potential paradigm-shifting impact of world models on AI, highlighting investments by Yann LeCun and Fei-Fei Li in this technology as a successor to the current LLM paradigm.
An indie developer building a voice-first learning game for kids asks whether interactive world models will be production-ready within 12–18 months or if pre-rendered assets plus real-time avatars are the better near-term path.
Cortex 2.0 introduces a plan-and-act control framework that uses visual latent space trajectory generation to enable reliable long-horizon robotic manipulation in complex industrial environments, outperforming reactive Vision-Language-Action models.
MultiWorld is a unified framework for multi-agent multi-view video world modeling that achieves accurate control of multiple agents while maintaining multi-view consistency through a Multi-Agent Condition Module and Global State Encoder.
Researchers introduce Zero-shot World Models (ZWM), an approach that achieves visual competence comparable to state-of-the-art models while trained on minimal data (single child's visual experience) without task-specific training. This work demonstrates a path toward more data-efficient AI systems that match human developmental learning efficiency.
Overworld releases Waypoint-1.5, a real-time video world model designed for everyday GPUs, featuring improved visual fidelity and new 360p and 720p tiers for broader hardware accessibility.
LeWorldModel introduces a stable, end-to-end Joint-Embedding Predictive Architecture that trains directly from pixels with minimal hyperparameters and provable anti-collapse guarantees. It achieves significant speedups in planning compared to foundation models while maintaining competitive performance on robotic manipulation tasks.
Google has launched Project Genie, an experimental prototype for Google AI Ultra subscribers that allows users to create, explore, and remix infinite interactive worlds using Genie 3.
DeepMind announces Genie 3, a general-purpose world model capable of generating interactive environments from text prompts at 24fps in 720p with improved consistency and real-time interactivity compared to previous versions.
OpenAI's technical report on Sora describes a video generation model that unifies diverse visual data through visual patches, enabling large-scale training of generative models capable of producing high-definition videos up to one minute long across variable durations, aspect ratios, and resolutions.