SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing

Hugging Face Daily Papers Papers

Summary

SmartPhotoCrafter introduces an automatic photographic image editing pipeline that unifies quality comprehension and enhancement without explicit human instructions, outperforming existing generative models on photo-realistic enhancement tasks.

Traditional photographic image editing typically requires users to possess sufficient aesthetic understanding to provide appropriate instructions for adjusting image quality and camera parameters. However, this paradigm relies on explicit human instruction of aesthetic intent, which is often ambiguous, incomplete, or inaccessible to non-expert users. In this work, we propose SmartPhotoCrafter, an automatic photographic image editing method which formulates image editing as a tightly coupled reasoning-to-generation process. The proposed model first performs image quality comprehension and identifies deficiencies by the Image Critic module, and then the Photographic Artist module realizes targeted edits to enhance image appeal, eliminating the need for explicit human instructions. A multi-stage training pipeline is adopted: (i) Foundation pretraining to establish basic aesthetic understanding and editing capabilities, (ii) Adaptation with reasoning-guided multi-edit supervision to incorporate rich semantic guidance, and (iii) Coordinated reasoning-to generation reinforcement learning to jointly optimize reasoning and generation. During training, SmartPhotoCrafter emphasizes photo-realistic image generation, while supporting both image restoration and retouching tasks with consistent adherence to color- and tone-related semantics. We also construct a stage-specific dataset, which progressively builds reasoning and controllable generation, effective cross-module collaboration, and ultimately high-quality photographic enhancement. Experiments demonstrate that SmartPhotoCrafter outperforms existing generative models on the task of automatic photographic enhancement, achieving photo-realistic results while exhibiting higher tonal sensitivity to retouching instructions. Project page: https://github.com/vivoCameraResearch/SmartPhotoCrafter.
Original Article
View Cached Full Text

Cached at: 04/22/26, 06:17 AM

Paper page - SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing

Source: https://huggingface.co/papers/2604.19587 Authors:

,

,

,

,

,

,

,

,

,

,

,

Abstract

SmartPhotoCrafter automates photographic image editing by combining image quality comprehension with targeted enhancement, using a reasoning-to-generation approach that eliminates the need for explicit human instructions.

Traditional photographic image editing typically requires users to possess sufficient aesthetic understanding to provide appropriate instructions for adjusting image quality and camera parameters. However, this paradigm relies on explicit human instruction of aesthetic intent, which is often ambiguous, incomplete, or inaccessible to non-expert users. In this work, we propose SmartPhotoCrafter, an automatic photographic image editing method which formulates image editing as a tightly coupled reasoning-to-generation process. The proposed model first performsimage quality comprehensionand identifies deficiencies by theImage Critic module, and then thePhotographic Artist modulerealizes targeted edits to enhance image appeal, eliminating the need for explicit human instructions. Amulti-stage training pipelineis adopted: (i)Foundation pretrainingto establish basic aesthetic understanding and editing capabilities, (ii)Adaptation with reasoning-guided multi-edit supervisionto incorporate richsemantic guidance, and (iii) Coordinated reasoning-to generation reinforcement learning to jointly optimize reasoning and generation. During training, SmartPhotoCrafter emphasizesphoto-realistic image generation, while supporting bothimage restorationandretouching taskswith consistent adherence to color- and tone-related semantics. We also construct a stage-specific dataset, which progressively builds reasoning and controllable generation, effective cross-module collaboration, and ultimately high-quality photographic enhancement. Experiments demonstrate that SmartPhotoCrafter outperforms existing generative models on the task of automatic photographic enhancement, achieving photo-realistic results while exhibiting higher tonal sensitivity to retouching instructions. Project page: https://github.com/vivoCameraResearch/SmartPhotoCrafter.

View arXiv pageView PDFAdd to collection

Get this paper in your agent:

hf papers read 2604\.19587

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.19587 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2604.19587 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.19587 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

ETCHR: Editing To Clarify and Harness Reasoning

Hugging Face Daily Papers

ETCHR is a novel image editing approach that decouples visual reasoning from image generation, using a two-stage training process (Reasoning Imitation and Reasoning Enhancement) to improve multimodal language model performance across five visual reasoning tasks. It achieves consistent gains of 4-5% Pass@1 on models like Qwen3-VL-8B, Gemini-3.1-Flash-Lite, and Kimi K2.5.

Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

Hugging Face Daily Papers

Uni-Edit proposes using intelligent image editing as a single general task to simultaneously improve unified multimodal models' understanding, generation, and editing capabilities, with an automated data synthesis pipeline creating complex editing instructions.

Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing

Hugging Face Daily Papers

This paper introduces RE-Edit, a benchmark for evaluating image editing systems across five reasoning dimensions (physical, environmental, cultural, causal, referential) to assess logical consistency beyond visual plausibility. The benchmark includes 1,000 samples and evaluates ten open-source and two commercial models, showing that even advanced systems struggle with implicit multi-dimensional reasoning.