3d-environments

#3d-environments

Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

Hugging Face Daily Papers ↗ · 2026-05-31 Cached

Introduces Target Viewpoint Reproduction (TVR) task and TVRBench benchmark for evaluating foundation models' ability to actively adjust 3D viewpoints to match target images. Experiments reveal significant limitations in current open and closed-source models, with a unified post-training framework boosting success rates from ~12% to ~51%.

0 favorites 0 likes

#3d-environments

SleepWalk: A Three-Tier Benchmark for Stress-Testing Instruction-Guided Vision-Language Navigation

Hugging Face Daily Papers ↗ · 2026-05-11 Cached

SleepWalk is a three-tier benchmark for evaluating vision-language models' ability to predict spatially coherent trajectories in 3D environments from textual instructions and visual observations, revealing systematic failures in grounded spatial reasoning under occlusions and multi-step instructions.

0 favorites 0 likes

#3d-environments

SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds

Google DeepMind Blog ↗ · 2025-11-13 Cached

DeepMind introduces SIMA 2, an upgraded AI agent integrated with Gemini that can reason, converse, and self-improve within virtual 3D worlds, marking a significant step toward AGI and embodied AI.

0 favorites 0 likes

3d-environments

Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

SleepWalk: A Three-Tier Benchmark for Stress-Testing Instruction-Guided Vision-Language Navigation

SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds

Submit Feedback