Tag
Introduces Target Viewpoint Reproduction (TVR) task and TVRBench benchmark for evaluating foundation models' ability to actively adjust 3D viewpoints to match target images. Experiments reveal significant limitations in current open and closed-source models, with a unified post-training framework boosting success rates from ~12% to ~51%.
SleepWalk is a three-tier benchmark for evaluating vision-language models' ability to predict spatially coherent trajectories in 3D environments from textual instructions and visual observations, revealing systematic failures in grounded spatial reasoning under occlusions and multi-step instructions.
DeepMind introduces SIMA 2, an upgraded AI agent integrated with Gemini that can reason, converse, and self-improve within virtual 3D worlds, marking a significant step toward AGI and embodied AI.