Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation
Summary
This paper proposes Decoupled Residual Denoising Diffusion Models (DRDD) for unified and data-efficient image-to-image translation, decoupling noise diffusion for domain harmonization from residual diffusion for semantic mapping.
View Cached Full Text
Cached at: 06/03/26, 07:36 AM
Paper page - Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation
Source: https://huggingface.co/papers/2606.01048
Abstract
Decoupled Residual Denoising Diffusion models (DRDD) improve unified image-to-image translation by separating noise diffusion for domain harmonization from residual diffusion for semantic mapping, enhancing data efficiency and performance.
We propose Decoupled Residual DenoisingDiffusion models(DRDD) for unified and data-efficient image-to-image (I2I) translation. Whilediffusion modelshave advanced I2I translation in terms of quality and diversity, we uncover a previously under-explored property indiffusion models. Crucially, beyond its conventional role ofmanifold lifting(i.e., moving data off low-dimensional manifolds), injecting Gaussian noise facilitatesdomain harmonizationby implicitly aligning feature distributions across domains, a property particularly advantageous forunified I2I translation. However, existingdiffusion modelsprematurely erode this harmonization effect, as noise and residuals are simultaneously removed in a single coupled diffusion process. To address this, DRDD decouples the diffusion process into two sequential and independent diffusion stages: (1) a stochasticnoise diffusionfordomain harmonizationandmanifold lifting, and (2) a deterministicresidual diffusionthat learns the core semantic mapping entirely within the fixed-noise domain. This decoupling preserves harmonization andmanifold liftingeffects throughout the transformation, substantially simplifying the learning of unified mappings across diverse tasks and domains. Notably, thenoise diffusionstage is trained exclusively on abundant, unpaired target-domain images, greatly improvingdata efficiency. Comprehensive theoretical and empirical analysis demonstrates that DRDD is broadly compatible with mainstreamdiffusion modelsand consistently delivers robust,unified I2I translation, even under limited paired data. Our code is available at https://github.com/HKU-HealthAI/DRDD.
View arXiv pageView PDFGitHub6Add to collection
Get this paper in your agent:
hf papers read 2606\.01048
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.01048 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.01048 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.01048 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer
UniDDT proposes a decoupled diffusion transformer framework that unifies multimodal understanding and generation by leveraging a Noisy ViT encoder and LLM for semantic encoding, achieving strong performance on both tasks.
RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space
RepFusion proposes using multimodal large language models as noisy representation encoders for diffusion transformers in text-to-image generation, outperforming traditional denoising approaches.
Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation
Revisits uniform diffusion models, identifying a mismatch between the plug-in ELBO and cross-entropy denoising objective, and proposes leave-one-out parameterizations along with an absorbing-state reformulation that improves generation without additional training.
Drifting Objectives for Refining Discrete Diffusion Language Models
This paper introduces TokenDrift, a drifting objective that refines discrete diffusion language models by lifting categorical predictions to a continuous semantic space for anti-symmetric drifting, significantly improving generation quality under a fixed number of denoising steps.
MMDiff: Extending Diffusion Transformers for Multi-Modal Generation
MMDiff extends frozen diffusion transformers into multi-modal generative systems using lightweight decoders, achieving significant improvements in semantic segmentation and other perceptual tasks through multi-timestep feature fusion.