Pixal3D: Pixel-Aligned 3D Generation from Images

Hugging Face Daily Papers 05/11/26, 12:00 AM Papers

3d-generation image-to-3d pixel-aligned computer-vision fidelity tencent-arc deep-learning

Summary

Pixal3D introduces a pixel-aligned 3D generation approach that improves fidelity by establishing direct pixel-to-3D correspondences through back-projection conditioning, addressing issues in canonical space generation.

Recent advances in 3D generative models have rapidly improved image-to-3D synthesis quality, enabling higher-resolution geometry and more realistic appearance. Yet fidelity, which measures pixel-level faithfulness of the generated 3D asset to the input image, still remains a central bottleneck. We argue this stems from an implicit 2D-3D correspondence issue: most 3D-native generators synthesize shape in canonical space and inject image cues via attention, leaving pixel-to-3D associations ambiguous. To tackle this issue, we draw inspiration from 3D reconstruction and propose Pixal3D, a pixel-aligned 3D generation paradigm for high-fidelity 3D asset creation from images. Instead of generating in a canonical pose, Pixal3D directly generates 3D in a pixel-aligned way, consistent with the input view. To enable this, we introduce a pixel back-projection conditioning scheme that explicitly lifts multi-scale image features into a 3D feature volume, establishing direct pixel-to-3D correspondence without ambiguity. We show that Pixal3D is not only scalable and capable of producing high-quality 3D assets, but also substantially improves fidelity, approaching the fidelity level of reconstruction. Furthermore, Pixal3D naturally extends to multi-view generation by aggregating back-projected feature volumes across views. Finally, we show pixel-aligned generation benefits scene synthesis, and present a modular pipeline that produces high-fidelity, object-separated 3D scenes from images. Pixal3D for the first time demonstrates 3D-native pixel-aligned generation at scale, and provides a new inspiring way towards high-fidelity 3D generation of object or scene from single or multi-view images. Project page: https://ldyang694.github.io/projects/pixal3d/

Original Article

View Cached Full Text

Cached at: 05/12/26, 07:31 AM

Paper page - Pixal3D: Pixel-Aligned 3D Generation from Images

Source: https://huggingface.co/papers/2605.10922

Abstract

Pixal3D introduces a pixel-aligned 3D generation approach that addresses fidelity issues in 3D asset creation by establishing direct pixel-to-3D correspondences through back-projection conditioning.

Recent advances in3D generative modelshave rapidly improvedimage-to-3D synthesisquality, enabling higher-resolution geometry and more realistic appearance. Yetfidelity, which measures pixel-level faithfulness of the generated 3D asset to the input image, still remains a central bottleneck. We argue this stems from an implicit 2D-3D correspondence issue: most3D-native generatorssynthesize shape incanonical spaceand inject image cues via attention, leaving pixel-to-3D associations ambiguous. To tackle this issue, we draw inspiration from 3D reconstruction and propose Pixal3D, a pixel-aligned 3D generation paradigm for high-fidelity3D asset creation from images. Instead of generating in a canonical pose, Pixal3D directly generates 3D in a pixel-aligned way, consistent with the input view. To enable this, we introduce a pixelback-projection conditioningscheme that explicitly lifts multi-scale image features into a3D feature volume, establishing direct pixel-to-3D correspondence without ambiguity. We show that Pixal3D is not only scalable and capable of producing high-quality 3D assets, but also substantially improvesfidelity, approaching thefidelitylevel of reconstruction. Furthermore, Pixal3D naturally extends tomulti-view generationby aggregating back-projected feature volumes across views. Finally, we showpixel-aligned generationbenefitsscene synthesis, and present a modular pipeline that produces high-fidelity, object-separated 3D scenes from images. Pixal3D for the first time demonstrates 3D-nativepixel-aligned generationat scale, and provides a new inspiring way towards high-fidelity3D generation of object or scene from single or multi-view images. Project page: https://ldyang694.github.io/projects/pixal3d/

View arXiv page View PDF Project page GitHub16 Add to collection

Get this paper in your agent:

hf papers read 2605\.10922

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### TencentARC/Pixal3D Updatedabout 4 hours ago • 9

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.10922 in a dataset README.md to link it from this page.

Spaces citing this paper1

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Pixal3D: Pixel-Aligned 3D Generation from Images

Paper page - Pixal3D: Pixel-Aligned 3D Generation from Images

Abstract

Models citing this paper1

Datasets citing this paper0

Spaces citing this paper1

Collections including this paper0

Similar Articles

PixWorld: Unifying 3D Scene Generation and Reconstruction in Pixel Space

TencentARC/Pixal3D

World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning

Submit Feedback

Similar Articles

PixWorld: Unifying 3D Scene Generation and Reconstruction in Pixel Space

World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning