Tag
PAIWorld enhances diffusion-transformer world models with geometric awareness and cross-view attention to improve multi-view 3D consistency for robotic manipulation tasks, achieving state-of-the-art results on benchmarks.
BA-T is an iterative Transformer architecture for two-view bundle adjustment that improves 3D reconstruction accuracy and cross-view consistency using a lightweight design with only 16% of conventional decoder parameters, matching or surpassing larger models.