Tag
AEGIS uses activation-probe early warning to switch to a stronger policy before failures compound in long-horizon robot manipulation, recovering twice as many failures as budget-matched escalation.
AHA-WAM is an asynchronous world-action model that uses dual Diffusion Transformers to decouple world prediction from action execution, achieving efficient long-horizon planning and real-time control. It achieves state-of-the-art performance on robotic manipulation tasks with up to 92.8% success on RoboTwin and 78.3% on real-world tasks, while reaching 24.17 Hz closed-loop control.
TBD-VLA introduces a discrete vision-language-action framework that combines block diffusion with autoregressive generation to achieve efficient temporal action modeling and faster inference, significantly outperforming prior VLA approaches in simulation and real-world manipulation tasks.
Dream.exe proposes an evaluation framework that uses robotic manipulation tasks to assess video generation models' understanding of physical reality, finding that visual quality does not predict executable motion accuracy.
AFUN proposes an affordance foundation model that predicts functional masks and 3D motion curves from RGB-D observations and language descriptions, enabling generalizable robot manipulation across diverse environments. The model outperforms baselines on multiple benchmarks and can be deployed for real-world tasks without fine-tuning.
RoboSemanticBench is a benchmark that diagnoses semantic grounding in action prediction for vision-language-action models, revealing that while robots can grasp objects, they fail to select semantically correct targets based on instruction semantics.
IntentVLA is a history-conditioned visual-language-action framework that improves robot imitation learning stability by encoding short-horizon intents from visual observations, addressing challenges from partial observability and ambiguous observations. It also introduces AliasBench, an ambiguity-aware benchmark for evaluating such methods.
RoboEvolve is a framework that co-evolves a VLM planner and VGM simulator for robotic manipulation, achieving data efficiency with only 500 unlabeled seed images and robust continual learning.
Google DeepMind introduces Gemini Robotics On-Device, an efficient VLA model optimized to run locally on robotic devices, enabling low-latency operation and offline capability while maintaining strong dexterous manipulation and task generalization. The model can be fine-tuned with as few as 50-100 demonstrations and comes with an SDK for developers.
OpenAI presents Hindsight Experience Replay (HER), a technique enabling sample-efficient reinforcement learning from sparse binary rewards without complex reward engineering. It is demonstrated on robotic arm manipulation tasks including pushing, sliding, and pick-and-place, and validated on physical robots.