Unified Panoramic Geometry Estimation via Multi-View Foundation Models

Hugging Face Daily Papers 05/25/26, 12:00 AM Papers

panoramic-geometry depth-estimation surface-normals multi-view foundation-models computer-vision cubemap

Summary

PaGeR adapts the multi-view perspective foundation model Depth Anything 3 to predict scale-invariant and metric depth, surface normals, and sky segmentation from a single equirectangular image, using a fixed cubemap representation that keeps VRAM and runtime constant. The paper also releases the ZüriPano and PanoInfinigen datasets.

Geometry estimation from perspective images has greatly advanced, maturing to the point where off-the-shelf foundation models are able to reconstruct 3D scene structure not only from multi-view imagery, but even from a single view. A natural extension is 3D reconstruction from panoramas, with the exciting prospect of recovering a full 360-degree scene from a single panoramic image. In this work, we introduce PaGeR (Panoramic Geometry Reconstruction), a framework to lift powerful 3D foundation models designed for perspective imagery to the panorama domain. Our strategy is to start from a pre-trained transformer for 3D reconstruction and turn it into a unified high-performance model that predicts scale-invariant depth, metric depth, surface normals, and sky masks from both perspective and omnidirectional images, in a single forward pass. By keeping architectural changes to a minimum and mixing perspective and panoramic images during training, PaGeR retains the rich 3D prior of the underlying foundation model while learning to also estimate geometrically consistent 360-degree scenes from single panoramas. We extensively test our method in both indoor and outdoor environments and find that it delivers state-of-the-art performance and excellent zero-shot performance across a wide range of scenes.

Original Article

View Cached Full Text

Cached at: 05/29/26, 03:00 AM

Paper page - Unified Panoramic Geometry Estimation via Multi-View Foundation Models

Source: https://huggingface.co/papers/2605.26368 TL;DR: PaGeR turns a perspective 3D foundation model into a single-pass 360° geometry estimator — from one equirectangular image it predicts scale-invariant depth, metric depth (in metres), surface normals, and sky segmentation at full panoramic resolution.

We introduce PaGeR (Panoramic Geometry Reconstruction), which lifts a multi-view perspective foundation model (Depth Anything 3) to the panoramic domain via a fixed 6×504×504 cubemap, so VRAM and runtime stay constant regardless of input resolution. A single forward pass returns Scale-invariant + metric depth, world-frame normals, and a sky mask. We also release two new datasets — ZüriPano (real eval) and PanoInfinigen (synthetic training).

🔗 Project page:https://pager360.github.io· 🤗 Demo:https://huggingface.co/spaces/prs-eth/PaGeR· Collection (models + datasets):https://huggingface.co/collections/prs-eth/pager-697241d06b3733a6f18e4d39· Code:https://github.com/prs-eth/PaGeR

Happy to answer any questions!

Unified Panoramic Geometry Estimation via Multi-View Foundation Models

Paper page - Unified Panoramic Geometry Estimation via Multi-View Foundation Models

Similar Articles

PanoWorld: Real-World Panoramic Generation

Enhancing In-context Panoramic Generation via Geometric-aware Pretraining

PanoWorld: Towards Spatial Supersensing in 360^circ Panorama World

One Scene, Two Depths: Probing Geometric Ambiguity in Monocular Foundation Models

MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold

Submit Feedback

Similar Articles

PanoWorld: Real-World Panoramic Generation

Enhancing In-context Panoramic Generation via Geometric-aware Pretraining

PanoWorld: Towards Spatial Supersensing in 360^circ Panorama World

One Scene, Two Depths: Probing Geometric Ambiguity in Monocular Foundation Models

MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold