Delta-Adapter: Scalable Exemplar-Based Image Editing with Single-Pair Supervision

Hugging Face Daily Papers Papers

Summary

Delta-Adapter enables exemplar-based image editing using single-pair supervision by extracting semantic deltas from pre-trained vision encoders and injecting them via Perceiver-based adapters, improving accuracy and generalization.

Exemplar-based image editing applies a transformation defined by a source-target image pair to a new query image. Existing methods rely on a pair-of-pairs supervision paradigm, requiring two image pairs sharing the same edit semantics to learn the target transformation. This constraint makes training data difficult to curate at scale and limits generalization across diverse edit types. We propose Delta-Adapter, a method that learns transferable editing semantics under single-pair supervision, requiring no textual guidance. Rather than directly exposing the exemplar pair to the model, we leverage a pre-trained vision encoder to extract a semantic delta that encodes the visual transformation between the two images. This semantic delta is injected into a pre-trained image editing model via a Perceiver-based adapter. Since the target image is never directly visible to the model, it can serve as the prediction target, enabling single-pair supervision without requiring additional exemplar pairs. This formulation allows us to leverage existing large-scale editing datasets for training. To further promote faithful transformation transfer, we introduce a semantic delta consistency loss that aligns the semantic change of the generated output with the ground-truth semantic delta extracted from the exemplar pair. Extensive experiments demonstrate that Delta-Adapter consistently improves both editing accuracy and content consistency over four strong baselines on seen editing tasks, while also generalizing more effectively to unseen editing tasks. Code will be available at https://delta-adapter.github.io.
Original Article
View Cached Full Text

Cached at: 05/11/26, 06:55 PM

Paper page - Delta-Adapter: Scalable Exemplar-Based Image Editing with Single-Pair Supervision

Source: https://huggingface.co/papers/2605.07940

Abstract

Delta-Adapter enables image editing with single-pair supervision by extracting semantic deltas from pre-trained vision encoders and injecting them into editing models via Perceiver-based adapters, improving accuracy and generalization.

Exemplar-based image editingapplies a transformation defined by a source-target image pair to a new query image. Existing methods rely on a pair-of-pairs supervision paradigm, requiring two image pairs sharing the same edit semantics to learn the target transformation. This constraint makes training data difficult to curate at scale and limits generalization across diverse edit types. We propose Delta-Adapter, a method that learns transferable editing semantics undersingle-pair supervision, requiring no textual guidance. Rather than directly exposing the exemplar pair to the model, we leverage apre-trained vision encoderto extract asemantic deltathat encodes the visual transformation between the two images. Thissemantic deltais injected into a pre-trainedimage editing modelvia aPerceiver-based adapter. Since the target image is never directly visible to the model, it can serve as the prediction target, enablingsingle-pair supervisionwithout requiring additional exemplar pairs. This formulation allows us to leverage existing large-scale editing datasets for training. To further promote faithful transformation transfer, we introduce asemantic delta consistency lossthat aligns the semantic change of the generated output with the ground-truthsemantic deltaextracted from the exemplar pair. Extensive experiments demonstrate that Delta-Adapter consistently improves both editing accuracy and content consistency over four strong baselines on seen editing tasks, while also generalizing more effectively to unseen editing tasks. Code will be available at https://delta-adapter.github.io.

View arXiv pageView PDFProject pageGitHub2Add to collection

Get this paper in your agent:

hf papers read 2605\.07940

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.07940 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.07940 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.07940 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

Hugging Face Daily Papers

Uni-Edit proposes using intelligent image editing as a single general task to simultaneously improve unified multimodal models' understanding, generation, and editing capabilities, with an automated data synthesis pipeline creating complex editing instructions.

Delta Attention Residuals

Hugging Face Daily Papers

Delta Attention Residuals improve layer-wise routing in transformer models by attending to feature changes (deltas) rather than cumulative hidden states, achieving 1.7–8.2% validation perplexity gains across scales from 220M to 7.6B parameters.