RefGC-SR^2: Reference-guided Generated Content Super-Resolution and Refinement
Summary
This paper introduces a new task, reference-guided generated content super-resolution-refinement (RefGC-SR²), which simultaneously recovers high-resolution details and refines generative artifacts using a frequency-aware diffusion transformer model. The method leverages a high-resolution reference image to improve the quality of AI-generated images during post-processing.
View Cached Full Text
Cached at: 06/17/26, 11:37 AM
Paper page - RefGC-SR^2: Reference-guided Generated Content Super-Resolution and Refinement
Source: https://huggingface.co/papers/2606.15158
Abstract
A new reference-guided generated content super-resolution-refinement task is introduced that simultaneously recovers high-resolution details and refines generative artifacts using a frequency-aware diffusion transformer model.
Reference-guided generation(e.g.,object compositing,customization) has progressed rapidly, yet current pipelines share a fundamental limitation: the object-centrichigh-resolution reference image(HRRI) provided by users is downsampled to a fixedlow-resolution(LR) before being fed into the model, so the fine-grained details are discarded before the output is even produced. In addition, the generation step then introduces its own artifacts (e.g., identity distortion) on top of this loss. Existing reference-guided generated content refinement (RefGCR) methods can correct some of these artifacts but still operate in the LR domain; reference-guided super-resolution (RefSR) methods recover resolution but assume natural-image degradations and ignore the artifact distribution of generative pipelines. To address both gaps in a single formulation, we introduce a new task: reference-guided generated contentsuper-resolution-refinement(RefGC-SR^2), where the original HRRI is reused at the post-processing stage to recover lost details, refinegenerative artifacts, and upscale the output simultaneously. We construct the firstreal-world triplet data generationpipeline for this RefGC-SR^2 task, training adiptych-conditioned generatorto synthesize paired low-quality anchors that public pretrained models cannot provide. We further present afrequency-awarediffusion transformermodel for RefGC-SR^2 that selectively injects fine details from the HRRI while removinggenerative artifacts. Extensive experiments demonstrate that our RefGC-SR^2 model successfully (i) refines the object identity faithfully with respect to the reference, and (ii) recovers high-resolution details, so that the final result is significantly higher quality and practically more usable compared to existing RefGCR and RefSR baselines.
View arXiv pageView PDFProject pageAdd to collection
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.15158 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.15158 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.15158 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers
SEGA is a training-free method that improves high-resolution text-to-image generation by adaptively scaling attention across RoPE components based on spatial-frequency structure during denoising steps.
PRISM: Prior Rectification and Uncertainty-Aware Structure Modeling for Diffusion-Based Text Image Super-Resolution
PRISM is a diffusion-based framework for text image super-resolution that uses flow-matching prior rectification and uncertainty-aware residual encoding to improve accuracy under severe degradation, achieving state-of-the-art performance with millisecond-level inference.
SRT: Super-Resolution for Time Series via Disentangled Rectified Flow
This paper proposes SRT (Super-Resolution for Time Series), a framework that reconstructs high-resolution temporal patterns from low-resolution inputs using a disentangled rectified flow approach. The method decomposes input into trend and seasonal components, applies implicit neural representation for resolution alignment, and introduces cross-resolution attention to generate fine-grained details, achieving state-of-the-art performance on multiple datasets.
GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction
GenRecon introduces a method for 3D scene reconstruction that integrates generative 3D priors with multi-view image conditioning, achieving high-fidelity, editable mesh reconstructions of indoor environments and outperforming existing methods by 16%.
tencentarc/gfpgan
GFPGAN is a practical face restoration model by Tencent ARC, available on Replicate. It restores old or low-quality face images with high fidelity.