@IlirAliu_: Forget lidar. One single camera. Runs in real time & is open source: A streaming 3D model that reconstructs scenes live…

X AI KOLs Timeline 06/26/26, 06:03 PM Models

Summary

LingBot-Map is an open-source, real-time streaming 3D reconstruction model that uses a single camera, running at ~20 FPS via a feed-forward geometric context transformer, outperforming both streaming and offline methods.

Forget lidar. One single camera. Runs in real time & is open source: A streaming 3D model that reconstructs scenes live, at ~20 FPS, over long sequences. End-to-end. Optimization tricks, cleanup steps? Nope. And it beats both streaming and even some offline methods. Perception is becoming software-first. Closer to machines that see and understand the world as it unfolds. Thanks for sharing, @YinghaoXu1 Models: https://huggingface.co/robbyant/lingbot-map… Project page: https://technology.robbyant.com/lingbot-map Code: https://github.com/Robbyant/lingbot-map… Paper: https://arxiv.org/abs/2604.14141 —— Weekly robotics and AI insights. Subscribe free: http://22astronauts.com

Original Article

View Cached Full Text

Cached at: 06/28/26, 06:02 AM

Forget lidar. One single camera. Runs in real time & is open source:

A streaming 3D model that reconstructs scenes live, at ~20 FPS, over long sequences.

End-to-end.

Optimization tricks, cleanup steps?

Nope.

And it beats both streaming and even some offline methods.

Perception is becoming software-first.

Closer to machines that see and understand the world as it unfolds.

Thanks for sharing, @YinghaoXu1

Models: https://huggingface.co/robbyant/lingbot-map… Project page: https://technology.robbyant.com/lingbot-map Code: https://github.com/Robbyant/lingbot-map… Paper: https://arxiv.org/abs/2604.14141

——

Weekly robotics and AI insights. Subscribe free: http://22astronauts.com

robbyant/lingbot-map · Hugging Face

Source: https://huggingface.co/robbyant/lingbot-map

LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction

Robbyant Team

https://github.com/user-attachments/assets/fe39e095-af2c-4ec9-b68d-a8ba97e505ab

https://huggingface.co/robbyant/lingbot-map#%F0%9F%97%BA%EF%B8%8F-meet-lingbot-map-weve-built-a-feed-forward-3d-foundation-model-for-streaming-3d-reconstruction-%F0%9F%8F%97%EF%B8%8F%F0%9F%8C%8D🗺️ Meet LingBot-Map! We’ve built a feed-forward 3D foundation model for streaming 3D reconstruction! 🏗️🌍

LingBot-Map has focused on:

Geometric Context Transformer: Architecturally unifies coordinate grounding, dense geometric cues, and long-range drift correction within a single streaming framework through anchor context, pose-reference window, and trajectory memory.
High-Efficiency Streaming Inference: A feed-forward architecture with paged KV cache attention, enabling stable inference at ~20 FPS on 518×378 resolution over long sequences exceeding 10,000 frames.
State-of-the-Art Reconstruction: Superior performance on diverse benchmarks compared to both existing streaming and iterative optimization-based approaches.

https://huggingface.co/robbyant/lingbot-map#%E2%9A%99%EF%B8%8F-quick-start⚙️ Quick Start

https://huggingface.co/robbyant/lingbot-map#installationInstallation

1. Create conda environment

conda create -n lingbot-map python=3.10 -y
conda activate lingbot-map

2. Install PyTorch (CUDA 12.8)

pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128

For other CUDA versions, seePyTorch Get Started.

3. Install lingbot-map

pip install -e .

4. Install FlashInfer (recommended)

FlashInfer provides paged KV cache attention for efficient streaming inference:

# CUDA 12.8 + PyTorch 2.9
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/

For other CUDA/PyTorch combinations, seeFlashInfer installation. If FlashInfer is not installed, the model falls back to SDPA (PyTorch native attention) via\-\-use\_sdpa.

5. Visualization dependencies (optional)

pip install -e ".[vis]"

https://huggingface.co/robbyant/lingbot-map#%F0%9F%93%A6-model-download📦 Model Download

https://huggingface.co/robbyant/lingbot-map#%F0%9F%8E%AC-demo🎬 Demo

https://huggingface.co/robbyant/lingbot-map#streaming-inference-from-imagesStreaming Inference from Images

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/

https://huggingface.co/robbyant/lingbot-map#streaming-inference-from-videoStreaming Inference from Video

python demo.py --model_path /path/to/checkpoint.pt \
    --video_path video.mp4 --fps 10

https://huggingface.co/robbyant/lingbot-map#streaming-with-keyframe-intervalStreaming with Keyframe Interval

Use\-\-keyframe\_intervalto reduce KV cache memory by only keeping every N-th frame as a keyframe. Non-keyframe frames still produce predictions but are not stored in the cache. This is useful for long sequences which excesses 320 frames.

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --keyframe_interval 6

https://huggingface.co/robbyant/lingbot-map#windowed-inference-for-long-sequences-3000-framesWindowed Inference (for long sequences, >3000 frames)

python demo.py --model_path /path/to/checkpoint.pt \
    --video_path video.mp4 --fps 10 \
    --mode windowed --window_size 64

https://huggingface.co/robbyant/lingbot-map#sky-maskingSky Masking

Sky masking uses an ONNX sky segmentation model to filter out sky points from the reconstructed point cloud, which improves visualization quality for outdoor scenes.

Setup:

# Install onnxruntime (required)
pip install onnxruntime        # CPU
# or
pip install onnxruntime-gpu    # GPU (faster for large image sets)

The sky segmentation model (skyseg\.onnx) will be automatically downloaded fromHuggingFaceon first use.

Usage:

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --mask_sky

Sky masks are cached in<image\_folder\>\_sky\_masks/so subsequent runs skip regeneration.

https://huggingface.co/robbyant/lingbot-map#without-flashinfer-sdpa-fallbackWithout FlashInfer (SDPA fallback)

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --use_sdpa

https://huggingface.co/robbyant/lingbot-map#%F0%9F%93%9C-license📜 License

This project is released under the Apache License 2.0. SeeLICENSEfile for details.

https://huggingface.co/robbyant/lingbot-map#%F0%9F%93%96-citation📖 Citation

@article{chen2026geometric,
  title={Geometric Context Transformer for Streaming 3D Reconstruction},
  author={Chen, Lin-Zhuo and Gao, Jian and Chen, Yihang and Cheng, Ka Leong and Sun, Yipengjing and Hu, Liangxiao and Xue, Nan and Zhu, Xing and Shen, Yujun and Yao, Yao and Xu, Yinghao},
  journal={arXiv preprint arXiv:2604.14141},
  year={2026}
}

https://huggingface.co/robbyant/lingbot-map#%E2%9C%A8-acknowledgments✨ Acknowledgments

We thank Shangzhan Zhang, Jianyuan Wang, Yudong Jin, Christian Rupprecht, and Xun Cao for their helpful discussions and support.

This work builds upon several excellent open-source projects:

@IlirAliu_: Forget lidar. One single camera. Runs in real time & is open source: A streaming 3D model that reconstructs scenes live…

robbyant/lingbot-map · Hugging Face

LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction

https://huggingface.co/robbyant/lingbot-map#%E2%9A%99%EF%B8%8F-quick-start⚙️ Quick Start

https://huggingface.co/robbyant/lingbot-map#installationInstallation

https://huggingface.co/robbyant/lingbot-map#%F0%9F%93%A6-model-download📦 Model Download

https://huggingface.co/robbyant/lingbot-map#%F0%9F%8E%AC-demo🎬 Demo

https://huggingface.co/robbyant/lingbot-map#streaming-inference-from-imagesStreaming Inference from Images

https://huggingface.co/robbyant/lingbot-map#streaming-inference-from-videoStreaming Inference from Video

https://huggingface.co/robbyant/lingbot-map#streaming-with-keyframe-intervalStreaming with Keyframe Interval

https://huggingface.co/robbyant/lingbot-map#windowed-inference-for-long-sequences-3000-framesWindowed Inference (for long sequences, >3000 frames)

https://huggingface.co/robbyant/lingbot-map#sky-maskingSky Masking

https://huggingface.co/robbyant/lingbot-map#without-flashinfer-sdpa-fallbackWithout FlashInfer (SDPA fallback)

https://huggingface.co/robbyant/lingbot-map#%F0%9F%93%9C-license📜 License

https://huggingface.co/robbyant/lingbot-map#%F0%9F%93%96-citation📖 Citation

https://huggingface.co/robbyant/lingbot-map#%E2%9C%A8-acknowledgments✨ Acknowledgments

Similar Articles

robbyant/lingbot-map

Geometric Context Transformer for Streaming 3D Reconstruction

@FinanceYF5: This AI is impressive. LingBot-Map can convert real-time video streams into real-time 3D reconstruction. 20 FPS code + model

We’re proud to open-source LIDARLearn [R] [D] [P]

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Submit Feedback

Similar Articles

Geometric Context Transformer for Streaming 3D Reconstruction

@FinanceYF5: This AI is impressive. LingBot-Map can convert real-time video streams into real-time 3D reconstruction. 20 FPS code + model

We’re proud to open-source LIDARLearn [R] [D] [P]

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction