Tag
AHA-WAM is an asynchronous world-action model that uses dual Diffusion Transformers to decouple world prediction from action execution, achieving efficient long-horizon planning and real-time control. It achieves state-of-the-art performance on robotic manipulation tasks with up to 92.8% success on RoboTwin and 78.3% on real-world tasks, while reaching 24.17 Hz closed-loop control.
Proposes CTRL-STEER, a closed-loop framework for adaptive steering of vision-language-action models using time-varying control signals, achieving better trade-off between concept regulation and task success without retraining.