Tag
A survey paper on World Action Models, covering recent advances in AI action and world models.
This survey provides a comprehensive overview of World Action Models (WAMs), predictive-action systems that generate future states for decision-making, and organizes existing works by their required outputs and design choices.
ImageWAM proposes replacing video generation with pretrained image editing models in world action models for robot control, achieving superior performance while reducing FLOPs to 1/6 and latency to 1/4 of video-based approaches.
Light-WAM is a lightweight world action model for efficient robot manipulation that uses a compact video backbone and downsampled latent space for future-video supervision, achieving high performance with low inference latency.
Flash-WAM introduces a modality-aware distillation method for world-action models, achieving real-time inference by compressing diffusion to a single step per modality, resulting in 23x speedup.
Curated GitHub list of Vision-Language-Action and World Action Models research for robotics foundation models.
In his talk at Sequoia AI Ascent, Dr. Jim Fan presents a roadmap for achieving Physical AGI parallel to LLM success, introducing concepts like video world models, World Action Models (WAM), and the Dexterity Scaling Law, and sharing predictions for the near future.
This paper introduces FFDC, a lightweight verifier for World Action Models that enables adaptive action chunk sizes by checking consistency between predicted and actual observations, improving efficiency and robustness in robotic manipulation.