Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning

Hugging Face Daily Papers 03/25/26, 12:00 AM Papers

3d-generation photorealistic domain-gap diffusion-models residual-adapters text-to-multiview texturing

Summary

Realiz3D introduces domain-aware learning to decouple visual domain from control signals in 3D-consistent image generation, using residual adapters and layer-specific denoising to produce photorealistic outputs from synthetic renders.

We often aim to generate images that are both photorealistic and 3D-consistent, adhering to precise geometry, material, and viewpoint controls. Typically, this is achieved by fine-tuning an image generator, pre-trained on billions of real images, using renders of synthetic 3D assets, where annotations for control signals are available. While this approach can learn the desired controls, it often compromises the realism of the images due to domain gap between photographs and renders. We observe that this issue largely arises from the model learning an unintended association between the presence of control signals and the synthetic appearance of the images. To address this, we introduce Realiz3D, a lightweight framework for training diffusion models, that decouples controls and visual domain. The key idea is to explicitly learn visual domain, real or synthetic, separately from other control signals by introducing a co-variate that, fed into small residual adapters, shifts the domain. Then, the generator can be trained to gain controllability, without fitting to specific visual domain. In this way, the model can be guided to produce realistic images even when controls are applied. We enhance control transferability to the real domain by leveraging insights about roles of different layers and denoising steps in diffusion-based generators, informing new training and inference strategies that further mitigate the gap. We demonstrate the advantages of Realiz3D in tasks as text-to-multiview generation and texturing from 3D inputs, producing outputs that are 3D-consistent and photorealistic.

Original Article

View Cached Full Text

Cached at: 05/15/26, 08:24 AM

Paper page - Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning

Source: https://huggingface.co/papers/2605.13852

Abstract

Realiz3D addresses the domain gap between synthetic renders and real images in 3D-consistent image generation by decoupling visual domain from control signals through residual adapters and layer-specific denoising strategies.

We often aim to generate images that are bothphotorealisticand3D-consistent, adhering to precise geometry, material, and viewpoint controls. Typically, this is achieved by fine-tuning animage generator, pre-trained on billions of real images, using renders ofsynthetic 3D assets, where annotations forcontrol signalsare available. While this approach can learn the desired controls, it often compromises the realism of the images due todomain gapbetween photographs and renders. We observe that this issue largely arises from the model learning an unintended association between the presence ofcontrol signalsand the synthetic appearance of the images. To address this, we introduce Realiz3D, a lightweight framework for trainingdiffusion models, that decouples controls andvisual domain. The key idea is to explicitly learnvisual domain, real or synthetic, separately from othercontrol signalsby introducing a co-variate that, fed into smallresidual adapters, shifts the domain. Then, the generator can be trained to gain controllability, without fitting to specificvisual domain. In this way, the model can be guided to produce realistic images even when controls are applied. We enhance control transferability to the real domain by leveraging insights about roles of different layers anddenoising stepsin diffusion-based generators, informing new training and inference strategies that further mitigate the gap. We demonstrate the advantages of Realiz3D in tasks astext-to-multiview generationandtexturingfrom 3D inputs, producing outputs that are3D-consistentandphotorealistic.

View arXiv page View PDF Project page Add to collection

Get this paper in your agent:

hf papers read 2605\.13852

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.13852 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.13852 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.13852 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning

Paper page - Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Pixal3D: Pixel-Aligned 3D Generation from Images

GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising

Submit Feedback

Similar Articles

Pixal3D: Pixel-Aligned 3D Generation from Images

GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising