Tag
This paper presents a comprehensive taxonomy of 3D vision research, covering geometric representations, datasets, learning paradigms, and applications in reconstruction, generation, and video modeling.
BrickAnything is an autoregressive framework that generates physically buildable brick structures from diverse 3D representations using point clouds and structure-aware tree tokenization, ensuring geometric fidelity and structural stability.
Introduces NEO, a neural framework that predicts low-frequency Laplace-Beltrami eigenspace from point clouds, achieving near-linear scaling and strong zero-shot generalization using a mass-aware neural operator and Rayleigh-Ritz refinement.
WorldString is a neural architecture that models object state manifolds from point clouds or RGB-D video streams, serving as a foundational component for physical world models with differentiable structure for policy learning integration.
This paper introduces a theoretical framework to analyze the generalization error of canonization methods for symmetric data, proving that Hilbert curve serialization offers polynomial growth in covering number compared to exponential growth in lexicographical sorting.
RigidFormer is a new mesh-free, object-centric Transformer model that learns rigid dynamics from point clouds, outperforming mesh-based baselines in speed and scalability for multi-object contact dynamics.
OpenAI introduces Point-E, a system for generating 3D point clouds from text prompts in 1-2 minutes on a single GPU by combining text-to-image and image-to-3D diffusion models. The method achieves significant speedup over prior methods while releasing pre-trained models and code.