αDepth: Learning Single-Pass Soft Boundary Decomposition for Stereo Conversion
Summary
αDepth introduces a layered representation with Circular Alpha Representation (CAR) to address soft boundary challenges in stereo conversion, achieving state-of-the-art performance without manual guidance.
View Cached Full Text
Cached at: 06/03/26, 11:37 AM
Paper page - αDepth: Learning Single-Pass Soft Boundary Decomposition for Stereo Conversion
Source: https://huggingface.co/papers/2606.00386
Abstract
αDepth introduces a layered representation with Circular Alpha Representation (CAR) to address soft boundary challenges in stereo conversion through local boundary decomposition and efficient scene-level inference.
Accurately modelingsoft boundaries, e.g., hair and defocus blur, is a fundamental challenge instereo conversiondue to the ambiguous blending of foreground and background. Existing depth models primarily predict single-layer depth, leading to ambiguity in depth correspondence atsoft boundaries. Whilematting techniquescan capture opacity for layered modeling, they often struggle in complex scenes with multiple targets and usually require user intervention. This paper introduces αDepth, alayered representationthat decomposessoft boundariesfor high-fidelitystereo conversion. Specifically, we first resolve mixed color and depth ambiguity by estimating layered color and depth values atsoft boundaries. Considering complex multi-target scenes, we design aCircular Alpha Representation(CAR) that shifts the paradigm from global target extraction to local boundary decomposition. Unlike prior matting methods restricted to a single foreground/background, CAR enables efficientscene-level inferencewithout manual guidance. Extensive evaluations demonstrate that αDepth achieves state-of-the-art performance instereo conversion, eliminatingbackground bleedingandstructural distortionsatsoft boundaries.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2606\.00386
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.00386 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.00386 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.00386 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
@RuohanZhang76: Excited to introduce StereoPolicy, led by @EvansXuHan. StereoPolicy is an effective way to add geometric cues to modern…
Introduces StereoPolicy, a framework that leverages synchronized stereo image pairs to improve geometric reasoning for robot manipulation policies, avoiding the fragility of RGB-D and point clouds. It integrates with diffusion-based and vision-language-action policies, showing consistent improvements in simulation and real-world tasks.
Unified Panoramic Geometry Estimation via Multi-View Foundation Models
PaGeR adapts the multi-view perspective foundation model Depth Anything 3 to predict scale-invariant and metric depth, surface normals, and sky segmentation from a single equirectangular image, using a fixed cubemap representation that keeps VRAM and runtime constant. The paper also releases the ZüriPano and PanoInfinigen datasets.
Direct 3D-Aware Object Insertion via Decomposed Visual Proxies
This paper introduces DIRECT, a framework for pose-controllable 3D-aware object insertion that decomposes conditions into appearance, geometry, and context guidance to achieve high-fidelity compositing with explicit 3D pose control.
Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction
Introduces GARD, a diffusion-based framework that operates in the feature space of a feed-forward 3D reconstructor to jointly recover scene geometry and high-quality imagery from degraded inputs.
TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking
TrackCraft3R repurposes video diffusion transformers for dense 3D tracking from monocular video, using dual-latent representation and temporal RoPE alignment to achieve state-of-the-art performance with 1.3x faster speed and 4.6x less peak memory than prior methods.