SurGe: Improved Surface Geometry in Point Maps

Hugging Face Daily Papers 05/29/26, 12:00 AM Papers

Summary

SurGe introduces a Neighborhood Attention Decoder and a reformulated scale-invariant gradient matching loss to improve local surface geometry accuracy in feedforward 3D reconstruction, particularly for thin structures. It achieves state-of-the-art average rank on zero-shot monocular geometry benchmarks, with better local point map and normal metrics.

Recent feedforward 3D reconstruction methods predict point maps and estimate global 3D geometry remarkably well. However, their predictions still exhibit inaccurate local surface geometry, which is clearly visible qualitatively but only weakly reflected in common metrics. To make these errors more explicit in evaluation, we introduce a point map normal metric that evaluates the local surface orientation induced by neighboring 3D predictions. To reduce these errors, we propose two complementary components: a point gradient matching loss that supervises depth-normalized 3D finite differences, and a Neighborhood Attention Decoder (NAD) that progressively upsamples features and uses Neighborhood Attention for local feature mixing. Across eight zero-shot monocular geometry benchmarks, our model, SurGe, achieves the best average rank for global point map AbsRel and consistently improves local point map and point map normal evaluations.

Original Article

View Cached Full Text

Cached at: 06/01/26, 11:22 PM

Paper page - SurGe: Improved Surface Geometry in Point Maps

Source: https://huggingface.co/papers/2605.31577 We improve local accuracy in feedforward 3D reconstruction. Current point map models struggle with bending and oscillating artifacts for thin structures (chair legs, street lamps, etc). Easy to spot visually, but not well captured by pointwise metrics like AbsRel.

We use a Neighborhood Attention Decoder (NAD). Like DPT-style heads, it decodes point maps progressively across scales, but it replaces conv-based local mixing with neighborhood attention and window-matched RoPE in ViT-like blocks.

This gives content-dependent local mixing without full self-attention at pixel-resolution. In practice, it helps with thin structures and discontinuities, while also avoiding the patch artifacts we see with plain ViT/MLP decoders.

We also reformulate scale-invariant gradient matching for point maps. This family of losses worked best for us for when the main global error is relative. Our version keeps the pairwise scale-invariant behavior, but is directly applicable to points instead of scalar depth only.

For evaluation, we suggest a point map normal mean angular error as a complementary metric alongside global and local AbsRel. We compute normals from neighboring predicted 3D points and report the angular difference to the GT. Empirically, this matches our qualitative impression better.

On zero-shot monocular geometry benchmarks, SurGe gets the best average rank for global point map AbsRel among SotA methods. More importantly, it improves local point map and point map normal metrics, suggesting better local surface geometry. It matches what we see qualitatively.

SurGe: Improved Surface Geometry in Point Maps

Paper page - SurGe: Improved Surface Geometry in Point Maps

Similar Articles

Surflo: Consistent 3D Surface Flow Model with Global State

World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning

Geometry Matters: 3D Foundation Priors for Learning Semantic Correspondence

Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction

Submit Feedback

Similar Articles

Surflo: Consistent 3D Surface Flow Model with Global State

World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning

Geometry Matters: 3D Foundation Priors for Learning Semantic Correspondence

Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction