SurGe: Improved Surface Geometry in Point Maps

Hugging Face Daily Papers Papers

Summary

SurGe introduces a Neighborhood Attention Decoder and a reformulated scale-invariant gradient matching loss to improve local surface geometry accuracy in feedforward 3D reconstruction, particularly for thin structures. It achieves state-of-the-art average rank on zero-shot monocular geometry benchmarks, with better local point map and normal metrics.

Recent feedforward 3D reconstruction methods predict point maps and estimate global 3D geometry remarkably well. However, their predictions still exhibit inaccurate local surface geometry, which is clearly visible qualitatively but only weakly reflected in common metrics. To make these errors more explicit in evaluation, we introduce a point map normal metric that evaluates the local surface orientation induced by neighboring 3D predictions. To reduce these errors, we propose two complementary components: a point gradient matching loss that supervises depth-normalized 3D finite differences, and a Neighborhood Attention Decoder (NAD) that progressively upsamples features and uses Neighborhood Attention for local feature mixing. Across eight zero-shot monocular geometry benchmarks, our model, SurGe, achieves the best average rank for global point map AbsRel and consistently improves local point map and point map normal evaluations.
Original Article
View Cached Full Text

Cached at: 06/01/26, 11:22 PM

Paper page - SurGe: Improved Surface Geometry in Point Maps

Source: https://huggingface.co/papers/2605.31577 We improve local accuracy in feedforward 3D reconstruction. Current point map models struggle with bending and oscillating artifacts for thin structures (chair legs, street lamps, etc). Easy to spot visually, but not well captured by pointwise metrics like AbsRel.

HJuetyMW4AIHjw4

We use a Neighborhood Attention Decoder (NAD). Like DPT-style heads, it decodes point maps progressively across scales, but it replaces conv-based local mixing with neighborhood attention and window-matched RoPE in ViT-like blocks.

HJugEuKWkAEZJWQ

This gives content-dependent local mixing without full self-attention at pixel-resolution. In practice, it helps with thin structures and discontinuities, while also avoiding the patch artifacts we see with plain ViT/MLP decoders.

HJu_LPzXEAAT7d1

We also reformulate scale-invariant gradient matching for point maps. This family of losses worked best for us for when the main global error is relative. Our version keeps the pairwise scale-invariant behavior, but is directly applicable to points instead of scalar depth only.

For evaluation, we suggest a point map normal mean angular error as a complementary metric alongside global and local AbsRel. We compute normals from neighboring predicted 3D points and report the angular difference to the GT. Empirically, this matches our qualitative impression better.

On zero-shot monocular geometry benchmarks, SurGe gets the best average rank for global point map AbsRel among SotA methods. More importantly, it improves local point map and point map normal metrics, suggesting better local surface geometry. It matches what we see qualitatively.

HJut9e2WoAAci7c

Similar Articles

Surflo: Consistent 3D Surface Flow Model with Global State

Hugging Face Daily Papers

Surflo is a feed-forward 3D reconstruction model that compresses unposed RGB views into latent tokens and decodes consistent 3D surface points via flow matching, enabling variable-resolution output and outperforming existing methods in speed.

World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

Hugging Face Daily Papers

World Tracing introduces a generative pixel-aligned geometry representation that predicts 3D points aligned with observed pixels while completing occluded surfaces. It uses a diffusion transformer trained with pixel-space flow matching, achieving strong performance on visible-surface reconstruction and complete geometry generation across object, scene, and dynamic benchmarks.

Geometry Matters: 3D Foundation Priors for Learning Semantic Correspondence

Hugging Face Daily Papers

This paper introduces a post-training framework that leverages 3D priors from SAM3D to improve semantic correspondence in 2D foundation features, addressing issues like left-right confusion and repeated parts. The method uses instance-specific 3D reconstruction without pose annotations or spherical geometry shortcuts.