Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching

Hugging Face Daily Papers Papers

Summary

Lite Any Stereo V2 presents an efficient stereo matching approach achieving state-of-the-art accuracy with significantly reduced latency through optimized architecture and training strategies, including a 2D-only cost aggregation framework and a three-stage training strategy.

Recent advances in stereo matching have achieved remarkable accuracy, but often rely on large models, heavy computation, or additional foundation-model priors, making them difficult to deploy on resource-constrained platforms. In contrast, efficient stereo models offer faster inference but are commonly considered less capable of strong zero-shot generalization. In this paper, we challenge this assumption by introducing Lite Any Stereo V2 (LAS2), an ultra-fast model series designed for efficient zero-shot stereo matching. LAS2 is developed from both architecture and training perspectives. Architecturally, we revisit efficient stereo design under practical deployment settings and propose a 2D-only cost aggregation framework, optimized for real inference latency rather than theoretical MACs alone. For training, we develop a three-stage strategy that combines synthetic supervision, self-distillation, and real-world knowledge distillation. To improve the reliability of real-world pseudo supervision, we further introduce pseudo-label filtering and an error-clamping operation, enabling smoother synthetic-to-real transfer. We instantiate LAS2 as a family of models, including feed-forward variants for different efficiency budgets and an iterative variant for higher accuracy. Extensive experiments show that LAS2 achieves state-of-the-art accuracy among efficient stereo methods while maintaining significantly lower latency. Specifically, LAS2-H achieves stronger overall zero-shot performance than the iterative method Fast-FoundationStereo, with 1.8x and 2.7x faster inference on H200 and Orin, respectively. The project page, demos, and code are available at https://tomtomtommi.github.io/LiteAnyStereoV2/.
Original Article
View Cached Full Text

Cached at: 06/25/26, 05:13 PM

Paper page - Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching

Source: https://huggingface.co/papers/2606.24457

Abstract

Lite Any Stereo V2 (LAS2) presents an efficient stereo matching approach that achieves state-of-the-art accuracy with significantly reduced latency through optimized architecture and training strategies.

Recent advances instereo matchinghave achieved remarkable accuracy, but often rely on large models, heavy computation, or additional foundation-model priors, making them difficult to deploy on resource-constrained platforms. In contrast,efficient stereo modelsoffer faster inference but are commonly considered less capable of strongzero-shot generalization. In this paper, we challenge this assumption by introducing Lite Any Stereo V2 (LAS2), an ultra-fast model series designed for efficient zero-shotstereo matching. LAS2 is developed from both architecture and training perspectives. Architecturally, we revisit efficient stereo design under practical deployment settings and propose a 2D-onlycost aggregation framework, optimized for real inference latency rather than theoretical MACs alone. For training, we develop a three-stage strategy that combinessynthetic supervision,self-distillation, and real-worldknowledge distillation. To improve the reliability of real-world pseudo supervision, we further introducepseudo-label filteringand anerror-clampingoperation, enabling smoother synthetic-to-real transfer. We instantiate LAS2 as a family of models, includingfeed-forward variantsfor different efficiency budgets and aniterative variantfor higher accuracy. Extensive experiments show that LAS2 achieves state-of-the-art accuracy among efficient stereo methods while maintaining significantly lower latency. Specifically, LAS2-H achieves stronger overall zero-shot performance than the iterative method Fast-FoundationStereo, with 1.8x and 2.7x faster inference on H200 and Orin, respectively. The project page, demos, and code are available at https://tomtomtommi.github.io/LiteAnyStereoV2/.

View arXiv pageView PDFProject pageGitHub99Add to collection

Get this paper in your agent:

hf papers read 2606\.24457

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.24457 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.24457 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.24457 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Hugging Face Daily Papers

Lite3R is a model-agnostic framework that improves the efficiency of transformer-based 3D reconstruction using sparse linear attention and FP8-aware quantization. It reduces latency and memory usage by up to 2.4x while maintaining geometric accuracy on backbones like VGGT and DA3-Large.

LiteFrame Scales Video LLM Efficiency (6 minute read)

TLDR AI

LiteFrame introduces a highly efficient video encoder for Video LLMs that uses Compressed Token Distillation to enable up to 8x more frames and 35% latency reduction while maintaining accuracy, setting a new Pareto frontier for long-form video understanding.