Pixal3D: Pixel-Aligned 3D Generation from Images
Summary
Pixal3D introduces a pixel-aligned 3D generation approach that improves fidelity by establishing direct pixel-to-3D correspondences through back-projection conditioning, addressing issues in canonical space generation.
View Cached Full Text
Cached at: 05/12/26, 07:31 AM
Paper page - Pixal3D: Pixel-Aligned 3D Generation from Images
Source: https://huggingface.co/papers/2605.10922
Abstract
Pixal3D introduces a pixel-aligned 3D generation approach that addresses fidelity issues in 3D asset creation by establishing direct pixel-to-3D correspondences through back-projection conditioning.
Recent advances in3D generative modelshave rapidly improvedimage-to-3D synthesisquality, enabling higher-resolution geometry and more realistic appearance. Yetfidelity, which measures pixel-level faithfulness of the generated 3D asset to the input image, still remains a central bottleneck. We argue this stems from an implicit 2D-3D correspondence issue: most3D-native generatorssynthesize shape incanonical spaceand inject image cues via attention, leaving pixel-to-3D associations ambiguous. To tackle this issue, we draw inspiration from 3D reconstruction and propose Pixal3D, a pixel-aligned 3D generation paradigm for high-fidelity3D asset creation from images. Instead of generating in a canonical pose, Pixal3D directly generates 3D in a pixel-aligned way, consistent with the input view. To enable this, we introduce a pixelback-projection conditioningscheme that explicitly lifts multi-scale image features into a3D feature volume, establishing direct pixel-to-3D correspondence without ambiguity. We show that Pixal3D is not only scalable and capable of producing high-quality 3D assets, but also substantially improvesfidelity, approaching thefidelitylevel of reconstruction. Furthermore, Pixal3D naturally extends tomulti-view generationby aggregating back-projected feature volumes across views. Finally, we showpixel-aligned generationbenefitsscene synthesis, and present a modular pipeline that produces high-fidelity, object-separated 3D scenes from images. Pixal3D for the first time demonstrates 3D-nativepixel-aligned generationat scale, and provides a new inspiring way towards high-fidelity3D generation of object or scene from single or multi-view images. Project page: https://ldyang694.github.io/projects/pixal3d/
View arXiv pageView PDFProject pageGitHub16Add to collection
Get this paper in your agent:
hf papers read 2605\.10922
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper1
#### TencentARC/Pixal3D Updatedabout 4 hours ago • 9
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.10922 in a dataset README.md to link it from this page.
Spaces citing this paper1
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
TencentARC/Pixal3D
Pixal3D is a high-fidelity single-image-to-3D model by TencentARC and Microsoft, which explicitly lifts pixel features into 3D via back-projection for near-reconstruction-level geometry and PBR textures. The model is accepted to SIGGRAPH 2026, with inference code and demo available.
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
Sat3DGen introduces a geometry-first approach for generating street-level 3D scenes from a single satellite image, achieving improved geometric accuracy and photorealism through novel constraints and training strategies. The method demonstrates significant improvements over prior work on the VIGOR-OOD benchmark.
Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning
Realiz3D introduces domain-aware learning to decouple visual domain from control signals in 3D-consistent image generation, using residual adapters and layer-specific denoising to produce photorealistic outputs from synthetic renders.
Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion
Pantheon360 introduces a 3D-aware 360° video diffusion framework that uses an explicit 3D cache to enforce geometric consistency, enabling high-fidelity digital twin generation from sparse 360° inputs.
L2P: Unlocking Latent Potential for Pixel Generation
The L2P paper introduces a Latent-to-Pixel transfer paradigm that leverages pre-trained latent diffusion models to create efficient pixel-space models capable of 4K generation with minimal training overhead.