Pixal3D: Pixel-Aligned 3D Generation from Images

Hugging Face Daily Papers Papers

Summary

Pixal3D introduces a pixel-aligned 3D generation approach that improves fidelity by establishing direct pixel-to-3D correspondences through back-projection conditioning, addressing issues in canonical space generation.

Recent advances in 3D generative models have rapidly improved image-to-3D synthesis quality, enabling higher-resolution geometry and more realistic appearance. Yet fidelity, which measures pixel-level faithfulness of the generated 3D asset to the input image, still remains a central bottleneck. We argue this stems from an implicit 2D-3D correspondence issue: most 3D-native generators synthesize shape in canonical space and inject image cues via attention, leaving pixel-to-3D associations ambiguous. To tackle this issue, we draw inspiration from 3D reconstruction and propose Pixal3D, a pixel-aligned 3D generation paradigm for high-fidelity 3D asset creation from images. Instead of generating in a canonical pose, Pixal3D directly generates 3D in a pixel-aligned way, consistent with the input view. To enable this, we introduce a pixel back-projection conditioning scheme that explicitly lifts multi-scale image features into a 3D feature volume, establishing direct pixel-to-3D correspondence without ambiguity. We show that Pixal3D is not only scalable and capable of producing high-quality 3D assets, but also substantially improves fidelity, approaching the fidelity level of reconstruction. Furthermore, Pixal3D naturally extends to multi-view generation by aggregating back-projected feature volumes across views. Finally, we show pixel-aligned generation benefits scene synthesis, and present a modular pipeline that produces high-fidelity, object-separated 3D scenes from images. Pixal3D for the first time demonstrates 3D-native pixel-aligned generation at scale, and provides a new inspiring way towards high-fidelity 3D generation of object or scene from single or multi-view images. Project page: https://ldyang694.github.io/projects/pixal3d/
Original Article
View Cached Full Text

Cached at: 05/12/26, 07:31 AM

Paper page - Pixal3D: Pixel-Aligned 3D Generation from Images

Source: https://huggingface.co/papers/2605.10922

Abstract

Pixal3D introduces a pixel-aligned 3D generation approach that addresses fidelity issues in 3D asset creation by establishing direct pixel-to-3D correspondences through back-projection conditioning.

Recent advances in3D generative modelshave rapidly improvedimage-to-3D synthesisquality, enabling higher-resolution geometry and more realistic appearance. Yetfidelity, which measures pixel-level faithfulness of the generated 3D asset to the input image, still remains a central bottleneck. We argue this stems from an implicit 2D-3D correspondence issue: most3D-native generatorssynthesize shape incanonical spaceand inject image cues via attention, leaving pixel-to-3D associations ambiguous. To tackle this issue, we draw inspiration from 3D reconstruction and propose Pixal3D, a pixel-aligned 3D generation paradigm for high-fidelity3D asset creation from images. Instead of generating in a canonical pose, Pixal3D directly generates 3D in a pixel-aligned way, consistent with the input view. To enable this, we introduce a pixelback-projection conditioningscheme that explicitly lifts multi-scale image features into a3D feature volume, establishing direct pixel-to-3D correspondence without ambiguity. We show that Pixal3D is not only scalable and capable of producing high-quality 3D assets, but also substantially improvesfidelity, approaching thefidelitylevel of reconstruction. Furthermore, Pixal3D naturally extends tomulti-view generationby aggregating back-projected feature volumes across views. Finally, we showpixel-aligned generationbenefitsscene synthesis, and present a modular pipeline that produces high-fidelity, object-separated 3D scenes from images. Pixal3D for the first time demonstrates 3D-nativepixel-aligned generationat scale, and provides a new inspiring way towards high-fidelity3D generation of object or scene from single or multi-view images. Project page: https://ldyang694.github.io/projects/pixal3d/

View arXiv pageView PDFProject pageGitHub16Add to collection

Get this paper in your agent:

hf papers read 2605\.10922

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### TencentARC/Pixal3D Updatedabout 4 hours ago • 9

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.10922 in a dataset README.md to link it from this page.

Spaces citing this paper1

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

TencentARC/Pixal3D

Hugging Face Models Trending

Pixal3D is a high-fidelity single-image-to-3D model by TencentARC and Microsoft, which explicitly lifts pixel features into 3D via back-projection for near-reconstruction-level geometry and PBR textures. The model is accepted to SIGGRAPH 2026, with inference code and demo available.

Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

Hugging Face Daily Papers

Sat3DGen introduces a geometry-first approach for generating street-level 3D scenes from a single satellite image, achieving improved geometric accuracy and photorealism through novel constraints and training strategies. The method demonstrates significant improvements over prior work on the VIGOR-OOD benchmark.

L2P: Unlocking Latent Potential for Pixel Generation

Hugging Face Daily Papers

The L2P paper introduces a Latent-to-Pixel transfer paradigm that leverages pre-trained latent diffusion models to create efficient pixel-space models capable of 4K generation with minimal training overhead.