GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

Hugging Face Daily Papers Papers

Summary

GenRecon introduces a method for 3D scene reconstruction that integrates generative 3D priors with multi-view image conditioning, achieving high-fidelity, editable mesh reconstructions of indoor environments and outperforming existing methods by 16%.

We introduce a new approach to high-fidelity 3D scene reconstruction from multi-view RGB images that tightly couples reconstruction with a strong generative 3D prior. We cast scene reconstruction as conditional 3D generation over a set of spatially-localized, overlapping chunks that together tile the scene, scaling generation to large scene extents. Crucially, we inherit the fidelity and completeness of state-of-the-art generative shape models -- we use Trellis.2 as an example -- which we generalize to the scene level. To this end, we propose a projection-based conditioning mechanism that lifts posed multi-view image features into a coherent 3D representation aligned with the generative model, independent of view ordering and spatially anchored to the scene, yielding high-fidelity, multi-view consistent generated geometry. This enables lifting the strong object-level prior of Trellis.2 to multi-view, scene-scale generation, producing faithful, editable PBR mesh reconstructions of indoor environments. As a result, we obtain high-fidelity results that outperform cutting-edge reconstruction methods by 16%.
Original Article
View Cached Full Text

Cached at: 05/25/26, 02:35 AM

Paper page - GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

Source: https://huggingface.co/papers/2605.23888

Abstract

A novel method for 3D scene reconstruction that integrates generative 3D priors with multi-view image conditioning to produce high-fidelity, editable mesh reconstructions of indoor environments.

We introduce a new approach to high-fidelity3D scene reconstructionfrom multi-view RGB images that tightly couples reconstruction with a stronggenerative 3D prior. We cast scene reconstruction asconditional 3D generationover a set of spatially-localized, overlapping chunks that together tile the scene, scaling generation to large scene extents. Crucially, we inherit the fidelity and completeness of state-of-the-art generative shape models -- we useTrellis.2as an example -- which we generalize to the scene level. To this end, we propose aprojection-based conditioning mechanismthat lifts posedmulti-view image featuresinto acoherent 3D representationaligned with the generative model, independent of view ordering and spatially anchored to the scene, yielding high-fidelity, multi-view consistent generated geometry. This enables lifting the strong object-level prior ofTrellis.2to multi-view, scene-scale generation, producing faithful, editablePBR mesh reconstructionsof indoor environments. As a result, we obtain high-fidelity results that outperform cutting-edge reconstruction methods by 16%.

View arXiv pageView PDFProject pageAdd to collection

Get this paper in your agent:

hf papers read 2605\.23888

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.23888 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.23888 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.23888 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Unified Panoramic Geometry Estimation via Multi-View Foundation Models

Hugging Face Daily Papers

PaGeR adapts the multi-view perspective foundation model Depth Anything 3 to predict scale-invariant and metric depth, surface normals, and sky segmentation from a single equirectangular image, using a fixed cubemap representation that keeps VRAM and runtime constant. The paper also releases the ZüriPano and PanoInfinigen datasets.

Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

Hugging Face Daily Papers

Sat3DGen introduces a geometry-first approach for generating street-level 3D scenes from a single satellite image, achieving improved geometric accuracy and photorealism through novel constraints and training strategies. The method demonstrates significant improvements over prior work on the VIGOR-OOD benchmark.