SceneAligner: 3D-Grounded Floorplan Localization in the Wild

Hugging Face Daily Papers 05/21/26, 12:00 AM Papers

Summary

Presents SceneAligner, a deep learning approach for floorplan localization that uses 3D scene reconstruction and cross-modal correspondence learning to work in real-world environments with limited data.

Many public buildings provide floorplans with a "you are here" indicator to help visitors orient themselves. Floorplan localization seeks to computationally replicate this capability by determining where visual observations were captured within a floorplan. However, existing methods typically assume controlled small-scale environments and precise vectorized floorplans, limiting their ability to operate in large-scale buildings and rasterized floorplans. In this work, we present an approach for performing floorplan localization in the wild by grounding the task in a reconstructed 3D representation of the scene. Given an unconstrained image collection, our method reconstructs a gravity-aligned 3D scene and projects it into a 2D density map that serves as a floorplan proxy. Floorplan localization is then formulated as aligning this proxy with the input floorplan via a 2D similarity transform. To bridge the appearance gap between density maps and architectural floorplans, we adapt a 2D foundation model to learn cross-modal correspondences, introducing a fine-tuning scheme that encourages semantically aligned matches while preserving structural consistency. Extensive experiments demonstrate substantial improvements over prior methods, including in extremely sparse settings with as little as a single input image. Our code and data will be publicly available.

Original Article

View Cached Full Text

Cached at: 05/22/26, 06:34 AM

Paper page - SceneAligner: 3D-Grounded Floorplan Localization in the Wild

Source: https://huggingface.co/papers/2605.22581

Abstract

Deep learning approach for floorplan localization that uses 3D scene reconstruction and cross-modal correspondence learning to work in real-world environments with limited data.

Many public buildings provide floorplans with a “you are here” indicator to help visitors orient themselves.Floorplan localizationseeks to computationally replicate this capability by determining where visual observations were captured within a floorplan. However, existing methods typically assume controlled small-scale environments and precise vectorized floorplans, limiting their ability to operate in large-scale buildings and rasterized floorplans. In this work, we present an approach for performingfloorplan localizationin the wild by grounding the task in a reconstructed 3D representation of the scene. Given an unconstrained image collection, our method reconstructs agravity-aligned3D scene and projects it into a 2Ddensity mapthat serves as a floorplan proxy.Floorplan localizationis then formulated as aligning this proxy with the input floorplan via a2D similarity transform. To bridge the appearance gap betweendensity maps and architectural floorplans, we adapt a2D foundation modelto learncross-modal correspondences, introducing a fine-tuning scheme that encourages semantically aligned matches while preservingstructural consistency. Extensive experiments demonstrate substantial improvements over prior methods, including in extremely sparse settings with as little as a single input image. Our code and data will be publicly available.

View arXiv page View PDF Project page GitHub1 Add to collection

Get this paper in your agent:

hf papers read 2605\.22581

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.22581 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.22581 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.22581 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

SceneAligner: 3D-Grounded Floorplan Localization in the Wild

Paper page - SceneAligner: 3D-Grounded Floorplan Localization in the Wild

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

OSMGraphCLIP: Learning Global Location Representations from OpenStreetMap Graphs

Query-based Cross-Modal Projector Bolstering Mamba Multimodal LLM

LDARNet: DNA Adaptive Representation Network with Learnable Tokenization for Genomic Modeling

@AdinaYakup: JD just released JoyAI-Echo An interesting long video generation model 5 minute multi shot video generation Cross modal…

ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning

Submit Feedback

Similar Articles

OSMGraphCLIP: Learning Global Location Representations from OpenStreetMap Graphs

Query-based Cross-Modal Projector Bolstering Mamba Multimodal LLM

LDARNet: DNA Adaptive Representation Network with Learnable Tokenization for Genomic Modeling

@AdinaYakup: JD just released JoyAI-Echo An interesting long video generation model 5 minute multi shot video generation Cross modal…

ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning