Unified Panoramic Geometry Estimation via Multi-View Foundation Models

Hugging Face Daily Papers Papers

Summary

PaGeR adapts the multi-view perspective foundation model Depth Anything 3 to predict scale-invariant and metric depth, surface normals, and sky segmentation from a single equirectangular image, using a fixed cubemap representation that keeps VRAM and runtime constant. The paper also releases the ZüriPano and PanoInfinigen datasets.

Geometry estimation from perspective images has greatly advanced, maturing to the point where off-the-shelf foundation models are able to reconstruct 3D scene structure not only from multi-view imagery, but even from a single view. A natural extension is 3D reconstruction from panoramas, with the exciting prospect of recovering a full 360-degree scene from a single panoramic image. In this work, we introduce PaGeR (Panoramic Geometry Reconstruction), a framework to lift powerful 3D foundation models designed for perspective imagery to the panorama domain. Our strategy is to start from a pre-trained transformer for 3D reconstruction and turn it into a unified high-performance model that predicts scale-invariant depth, metric depth, surface normals, and sky masks from both perspective and omnidirectional images, in a single forward pass. By keeping architectural changes to a minimum and mixing perspective and panoramic images during training, PaGeR retains the rich 3D prior of the underlying foundation model while learning to also estimate geometrically consistent 360-degree scenes from single panoramas. We extensively test our method in both indoor and outdoor environments and find that it delivers state-of-the-art performance and excellent zero-shot performance across a wide range of scenes.
Original Article
View Cached Full Text

Cached at: 05/29/26, 03:00 AM

Paper page - Unified Panoramic Geometry Estimation via Multi-View Foundation Models

Source: https://huggingface.co/papers/2605.26368 TL;DR: PaGeR turns a perspective 3D foundation model into a single-pass 360° geometry estimator — from one equirectangular image it predicts scale-invariant depth, metric depth (in metres), surface normals, and sky segmentation at full panoramic resolution.

We introduce PaGeR (Panoramic Geometry Reconstruction), which lifts a multi-view perspective foundation model (Depth Anything 3) to the panoramic domain via a fixed 6×504×504 cubemap, so VRAM and runtime stay constant regardless of input resolution. A single forward pass returns Scale-invariant + metric depth, world-frame normals, and a sky mask. We also release two new datasets — ZüriPano (real eval) and PanoInfinigen (synthetic training).

🔗 Project page:https://pager360.github.io· 🤗 Demo:https://huggingface.co/spaces/prs-eth/PaGeR· Collection (models + datasets):https://huggingface.co/collections/prs-eth/pager-697241d06b3733a6f18e4d39· Code:https://github.com/prs-eth/PaGeR

Happy to answer any questions!

Similar Articles

PanoWorld: Towards Spatial Supersensing in 360^circ Panorama World

Hugging Face Daily Papers

PanoWorld introduces spherical spatial cross-attention for panoramic reasoning, addressing limitations of MLLMs in 360-degree spatial understanding. It builds a large-scale pipeline for geometry-aware supervision and proposes a diagnostic benchmark, achieving state-of-the-art results on multiple benchmarks.

TencentARC/Pixal3D

Hugging Face Models Trending

Pixal3D is a high-fidelity single-image-to-3D model by TencentARC and Microsoft, which explicitly lifts pixel features into 3D via back-projection for near-reconstruction-level geometry and PBR textures. The model is accepted to SIGGRAPH 2026, with inference code and demo available.