Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Hugging Face Daily Papers Papers

Summary

Lite3R is a model-agnostic framework that improves the efficiency of transformer-based 3D reconstruction using sparse linear attention and FP8-aware quantization. It reduces latency and memory usage by up to 2.4x while maintaining geometric accuracy on backbones like VGGT and DA3-Large.

Transformer-based 3D reconstruction has emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increasingly important for practical deployment. However, modern 3D transformer pipelines face two coupled challenges: dense multi-view attention creates substantial token-mixing overhead, and low-precision execution can destabilize geometry-sensitive representations and degrade depth, pose, and 3D consistency. To address the first challenge, we propose Lite3R, a model-agnostic teacher-student framework that replaces dense attention with Sparse Linear Attention to preserve important geometric interactions while reducing attention cost. To address the second challenge, we introduce a parameter-efficient FP8-aware quantization-aware training (FP8-aware QAT) strategy with partial attention distillation, which freezes the vast majority of pretrained backbone parameters and trains only lightweight linear-branch projection layers, enabling stable low-precision deployment while retaining pretrained geometric priors. We further evaluate Lite3R on two representative backbones, VGGT and DA3-Large, over BlendedMVS and DTU64, showing that it substantially reduces latency (1.7-2.0x) and memory usage (1.9-2.4x) while preserving competitive reconstruction quality overall. These results demonstrate that Lite3R provides an effective algorithm-system co-design approach for practical transformer-based 3D reconstruction. Code: https://github.com/AIGeeksGroup/Lite3R. Website: https://aigeeksgroup.github.io/Lite3R.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 05/13/26, 08:12 AM

Paper page - Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Source: https://huggingface.co/papers/2605.11354

Abstract

Lite3R addresses efficiency challenges in transformer-based 3D reconstruction through sparse attention and low-precision quantization while maintaining geometric accuracy.

Transformer-based 3D reconstructionhas emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increasingly important for practical deployment. However, modern 3D transformer pipelines face two coupled challenges: densemulti-view attentioncreates substantial token-mixing overhead, and low-precision execution can destabilize geometry-sensitive representations and degrade depth, pose, and3D consistency. To address the first challenge, we propose Lite3R, a model-agnostic teacher-student framework that replacesdense attentionwithSparse Linear Attentionto preserve important geometric interactions while reducing attention cost. To address the second challenge, we introduce a parameter-efficientFP8-aware quantization-aware training(FP8-aware QAT) strategy with partialattention distillation, which freezes the vast majority ofpretrained backboneparameters and trains only lightweight linear-branch projection layers, enabling stable low-precision deployment while retaining pretrainedgeometric priors. We further evaluate Lite3R on two representative backbones, VGGT and DA3-Large, over BlendedMVS and DTU64, showing that it substantially reduces latency (1.7-2.0x) and memory usage (1.9-2.4x) while preserving competitive reconstruction quality overall. These results demonstrate that Lite3R provides an effective algorithm-system co-design approach for practicaltransformer-based 3D reconstruction. Code: https://github.com/AIGeeksGroup/Lite3R. Website: https://aigeeksgroup.github.io/Lite3R.

View arXiv pageView PDFProject pageGitHub1Add to collection

Get this paper in your agent:

hf papers read 2605\.11354

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.11354 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.11354 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.11354 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

robbyant/lingbot-map

Hugging Face Models Trending

LingBot-Map is a feed-forward 3D foundation model for streaming 3D reconstruction that uses a Geometric Context Transformer architecture, achieving state-of-the-art performance with efficient ~20 FPS inference on long sequences exceeding 10,000 frames.