Unified Panoramic Geometry Estimation via Multi-View Foundation Models
Summary
PaGeR adapts the multi-view perspective foundation model Depth Anything 3 to predict scale-invariant and metric depth, surface normals, and sky segmentation from a single equirectangular image, using a fixed cubemap representation that keeps VRAM and runtime constant. The paper also releases the ZüriPano and PanoInfinigen datasets.
View Cached Full Text
Cached at: 05/29/26, 03:00 AM
Paper page - Unified Panoramic Geometry Estimation via Multi-View Foundation Models
Source: https://huggingface.co/papers/2605.26368 TL;DR: PaGeR turns a perspective 3D foundation model into a single-pass 360° geometry estimator — from one equirectangular image it predicts scale-invariant depth, metric depth (in metres), surface normals, and sky segmentation at full panoramic resolution.
We introduce PaGeR (Panoramic Geometry Reconstruction), which lifts a multi-view perspective foundation model (Depth Anything 3) to the panoramic domain via a fixed 6×504×504 cubemap, so VRAM and runtime stay constant regardless of input resolution. A single forward pass returns Scale-invariant + metric depth, world-frame normals, and a sky mask. We also release two new datasets — ZüriPano (real eval) and PanoInfinigen (synthetic training).
🔗 Project page:https://pager360.github.io· 🤗 Demo:https://huggingface.co/spaces/prs-eth/PaGeR· Collection (models + datasets):https://huggingface.co/collections/prs-eth/pager-697241d06b3733a6f18e4d39· Code:https://github.com/prs-eth/PaGeR
Happy to answer any questions!
Similar Articles
PanoWorld: Towards Spatial Supersensing in 360^circ Panorama World
PanoWorld introduces spherical spatial cross-attention for panoramic reasoning, addressing limitations of MLLMs in 360-degree spatial understanding. It builds a large-scale pipeline for geometry-aware supervision and proposes a diagnostic benchmark, achieving state-of-the-art results on multiple benchmarks.
GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction
GenRecon introduces a method for 3D scene reconstruction that integrates generative 3D priors with multi-view image conditioning, achieving high-fidelity, editable mesh reconstructions of indoor environments and outperforming existing methods by 16%.
Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion
Pantheon360 introduces a 3D-aware 360° video diffusion framework that uses an explicit 3D cache to enforce geometric consistency, enabling high-fidelity digital twin generation from sparse 360° inputs.
TencentARC/Pixal3D
Pixal3D is a high-fidelity single-image-to-3D model by TencentARC and Microsoft, which explicitly lifts pixel features into 3D via back-projection for near-reconstruction-level geometry and PBR textures. The model is accepted to SIGGRAPH 2026, with inference code and demo available.
Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction
Introduces GARD, a diffusion-based framework that operates in the feature space of a feed-forward 3D reconstructor to jointly recover scene geometry and high-quality imagery from degraded inputs.