Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction
Summary
Introduces GARD, a diffusion-based framework that operates in the feature space of a feed-forward 3D reconstructor to jointly recover scene geometry and high-quality imagery from degraded inputs.
View Cached Full Text
Cached at: 05/27/26, 02:47 AM
Paper page - Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction
Source: https://huggingface.co/papers/2605.26230
Abstract
A novel diffusion-based framework for multi-view 3D reconstruction that restores both scene geometry and high-quality imagery from degraded inputs by operating in the feature space of a 3D reconstructor.
Multi-view 3D reconstructionhas achieved remarkable progress with the advent of feed-forward 3D reconstruction models. However, these models are typically trained and evaluated under ideal, degradation-free imaging conditions, whereas real-world observations often contain degradations that differ significantly from such settings. Improving robustness formulti-view 3D reconstructionunder degraded conditions therefore remains an important challenge. We present Geometry-Aware Representation Denoising (GARD), a novel framework that performs diffusion-based multi-view restoration directly in thefeature spaceof a feed-forward 3D reconstruction model. This design exploits the geometry-aware feature representations of the 3D reconstructor to effectively recover accurate scene geometry. Furthermore, by employing an additionalRGB image decoder, the refined representations can also be used to restore high-quality RGB images, thereby enabling the simultaneous recovery of 3D scene geometry and high-quality imagery. Comprehensive experiments on the Depth Anything 3 (DA3) benchmark demonstrate the effectiveness of the proposed GARD framework.
View arXiv pageView PDFProject pageGitHub10Add to collection
Get this paper in your agent:
hf papers read 2605\.26230
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.26230 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.26230 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.26230 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model
AnyRecon proposes a scalable framework for 3D reconstruction from arbitrary sparse inputs using a video diffusion model with persistent scene memory and geometry-aware conditioning.
GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction
GenRecon introduces a method for 3D scene reconstruction that integrates generative 3D priors with multi-view image conditioning, achieving high-fidelity, editable mesh reconstructions of indoor environments and outperforming existing methods by 16%.
Geometry-Aware Tabular Diffusion
Introduces Geometry-Aware Tabular Diffusion (GATD), which augments tabular diffusion denoisers with explicit pairwise geometric features. Achieves state-of-the-art performance on ten benchmarks while using significantly fewer parameters.
Unified Panoramic Geometry Estimation via Multi-View Foundation Models
PaGeR adapts the multi-view perspective foundation model Depth Anything 3 to predict scale-invariant and metric depth, surface normals, and sky segmentation from a single equirectangular image, using a fixed cubemap representation that keeps VRAM and runtime constant. The paper also releases the ZüriPano and PanoInfinigen datasets.
VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors
VidSplat is a training-free generative reconstruction framework that uses video diffusion priors to recover complete 3D scenes from sparse inputs by synthesizing novel views.