@IlirAliu_: Forget lidar. One single camera. Runs in real time & is open source: A streaming 3D model that reconstructs scenes live…

X AI KOLs Timeline Models

Summary

LingBot-Map is an open-source, real-time streaming 3D reconstruction model that uses a single camera, running at ~20 FPS via a feed-forward geometric context transformer, outperforming both streaming and offline methods.

Forget lidar. One single camera. Runs in real time & is open source: A streaming 3D model that reconstructs scenes live, at ~20 FPS, over long sequences. End-to-end. Optimization tricks, cleanup steps? Nope. And it beats both streaming and even some offline methods. Perception is becoming software-first. Closer to machines that see and understand the world as it unfolds. Thanks for sharing, @YinghaoXu1 Models: https://huggingface.co/robbyant/lingbot-map… Project page: https://technology.robbyant.com/lingbot-map Code: https://github.com/Robbyant/lingbot-map… Paper: https://arxiv.org/abs/2604.14141 —— Weekly robotics and AI insights. Subscribe free: http://22astronauts.com
Original Article
View Cached Full Text

Cached at: 06/28/26, 06:02 AM

Forget lidar. One single camera. Runs in real time & is open source:

A streaming 3D model that reconstructs scenes live, at ~20 FPS, over long sequences.

End-to-end.

Optimization tricks, cleanup steps?

Nope.

And it beats both streaming and even some offline methods.

Perception is becoming software-first.

Closer to machines that see and understand the world as it unfolds.

Thanks for sharing, @YinghaoXu1

Models: https://huggingface.co/robbyant/lingbot-map… Project page: https://technology.robbyant.com/lingbot-map Code: https://github.com/Robbyant/lingbot-map… Paper: https://arxiv.org/abs/2604.14141

——

Weekly robotics and AI insights. Subscribe free: http://22astronauts.com


robbyant/lingbot-map · Hugging Face

Source: https://huggingface.co/robbyant/lingbot-map

LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction

Robbyant Team

PaperPDFProjectHuggingFaceModelScopeLicense

https://github.com/user-attachments/assets/fe39e095-af2c-4ec9-b68d-a8ba97e505ab


https://huggingface.co/robbyant/lingbot-map#%F0%9F%97%BA%EF%B8%8F-meet-lingbot-map-weve-built-a-feed-forward-3d-foundation-model-for-streaming-3d-reconstruction-%F0%9F%8F%97%EF%B8%8F%F0%9F%8C%8D🗺️ Meet LingBot-Map! We’ve built a feed-forward 3D foundation model for streaming 3D reconstruction! 🏗️🌍

LingBot-Map has focused on:

  • Geometric Context Transformer: Architecturally unifies coordinate grounding, dense geometric cues, and long-range drift correction within a single streaming framework through anchor context, pose-reference window, and trajectory memory.
  • High-Efficiency Streaming Inference: A feed-forward architecture with paged KV cache attention, enabling stable inference at ~20 FPS on 518×378 resolution over long sequences exceeding 10,000 frames.
  • State-of-the-Art Reconstruction: Superior performance on diverse benchmarks compared to both existing streaming and iterative optimization-based approaches.

https://huggingface.co/robbyant/lingbot-map#%E2%9A%99%EF%B8%8F-quick-start⚙️ Quick Start

https://huggingface.co/robbyant/lingbot-map#installationInstallation

1. Create conda environment

conda create -n lingbot-map python=3.10 -y
conda activate lingbot-map

2. Install PyTorch (CUDA 12.8)

pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128

For other CUDA versions, seePyTorch Get Started.

3. Install lingbot-map

pip install -e .

4. Install FlashInfer (recommended)

FlashInfer provides paged KV cache attention for efficient streaming inference:

# CUDA 12.8 + PyTorch 2.9
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/

For other CUDA/PyTorch combinations, seeFlashInfer installation. If FlashInfer is not installed, the model falls back to SDPA (PyTorch native attention) via\-\-use\_sdpa.

5. Visualization dependencies (optional)

pip install -e ".[vis]"

https://huggingface.co/robbyant/lingbot-map#%F0%9F%93%A6-model-download📦 Model Download

https://huggingface.co/robbyant/lingbot-map#%F0%9F%8E%AC-demo🎬 Demo

https://huggingface.co/robbyant/lingbot-map#streaming-inference-from-imagesStreaming Inference from Images

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/

https://huggingface.co/robbyant/lingbot-map#streaming-inference-from-videoStreaming Inference from Video

python demo.py --model_path /path/to/checkpoint.pt \
    --video_path video.mp4 --fps 10

https://huggingface.co/robbyant/lingbot-map#streaming-with-keyframe-intervalStreaming with Keyframe Interval

Use\-\-keyframe\_intervalto reduce KV cache memory by only keeping every N-th frame as a keyframe. Non-keyframe frames still produce predictions but are not stored in the cache. This is useful for long sequences which excesses 320 frames.

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --keyframe_interval 6

https://huggingface.co/robbyant/lingbot-map#windowed-inference-for-long-sequences-3000-framesWindowed Inference (for long sequences, >3000 frames)

python demo.py --model_path /path/to/checkpoint.pt \
    --video_path video.mp4 --fps 10 \
    --mode windowed --window_size 64

https://huggingface.co/robbyant/lingbot-map#sky-maskingSky Masking

Sky masking uses an ONNX sky segmentation model to filter out sky points from the reconstructed point cloud, which improves visualization quality for outdoor scenes.

Setup:

# Install onnxruntime (required)
pip install onnxruntime        # CPU
# or
pip install onnxruntime-gpu    # GPU (faster for large image sets)

The sky segmentation model (skyseg\.onnx) will be automatically downloaded fromHuggingFaceon first use.

Usage:

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --mask_sky

Sky masks are cached in<image\_folder\>\_sky\_masks/so subsequent runs skip regeneration.

https://huggingface.co/robbyant/lingbot-map#without-flashinfer-sdpa-fallbackWithout FlashInfer (SDPA fallback)

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --use_sdpa

https://huggingface.co/robbyant/lingbot-map#%F0%9F%93%9C-license📜 License

This project is released under the Apache License 2.0. SeeLICENSEfile for details.

https://huggingface.co/robbyant/lingbot-map#%F0%9F%93%96-citation📖 Citation

@article{chen2026geometric,
  title={Geometric Context Transformer for Streaming 3D Reconstruction},
  author={Chen, Lin-Zhuo and Gao, Jian and Chen, Yihang and Cheng, Ka Leong and Sun, Yipengjing and Hu, Liangxiao and Xue, Nan and Zhu, Xing and Shen, Yujun and Yao, Yao and Xu, Yinghao},
  journal={arXiv preprint arXiv:2604.14141},
  year={2026}
}

https://huggingface.co/robbyant/lingbot-map#%E2%9C%A8-acknowledgments✨ Acknowledgments

We thank Shangzhan Zhang, Jianyuan Wang, Yudong Jin, Christian Rupprecht, and Xun Cao for their helpful discussions and support.

This work builds upon several excellent open-source projects:


Similar Articles

robbyant/lingbot-map

Hugging Face Models Trending

LingBot-Map is a feed-forward 3D foundation model for streaming 3D reconstruction that uses a Geometric Context Transformer architecture, achieving state-of-the-art performance with efficient ~20 FPS inference on long sequences exceeding 10,000 frames.

We’re proud to open-source LIDARLearn [R] [D] [P]

Reddit r/MachineLearning

LIDARLearn is an open-source PyTorch library for 3D point cloud deep learning that unifies 56 pre-configured models with built-in cross-validation and automatic publication-ready LaTeX report generation. The framework supports supervised, self-supervised, and parameter-efficient fine-tuning methods across datasets like ModelNet40, ShapeNet, and remote sensing benchmarks.

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Hugging Face Daily Papers

Lite3R is a model-agnostic framework that improves the efficiency of transformer-based 3D reconstruction using sparse linear attention and FP8-aware quantization. It reduces latency and memory usage by up to 2.4x while maintaining geometric accuracy on backbones like VGGT and DA3-Large.