Tag
This paper presents a comprehensive taxonomy of 3D vision research, covering geometric representations, datasets, learning paradigms, and applications in reconstruction, generation, and video modeling.
This paper introduces a post-training framework that leverages 3D priors from SAM3D to improve semantic correspondence in 2D foundation features, addressing issues like left-right confusion and repeated parts. The method uses instance-specific 3D reconstruction without pose annotations or spherical geometry shortcuts.
SpatialBench is a comprehensive benchmark for evaluating spatial foundation models across diverse domains and tasks, revealing limitations in current models and introducing DA-Next-5M and DA-Next to advance spatial representation learning.
A Zhejiang University researcher shared a comprehensive PhD guide on GitHub, covering the entire research lifecycle from topic selection to rebuttals, specifically tailored for the 3D Vision direction.
Meta AI and Oxford VGG released VGGT-Omega, a foundation model for 3D vision, with project page and GitHub repository.