SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation
Summary
SCOPE is a specification-guided framework for text-to-image generation that tracks semantic commitments to better fulfill complex visual intents. It introduces the Gen-Arena benchmark and demonstrates strong performance on complex generation tasks.
View Cached Full Text
Cached at: 05/11/26, 07:19 AM
Paper page - SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation
Source: https://huggingface.co/papers/2605.08043 Authors:
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
SCOPE is a specification-guided framework that maintains semantic commitments throughout text-to-image generation to improve complex visual intent fulfillment.
While text-to-image models have made strong progress in visual fidelity, faithfully realizing complex visual intents remains challenging because many requirements must be tracked across grounding, generation, and verification. We refer to these requirements assemantic commitmentsand formalize their lifecycle discontinuity as theConceptual Rift, where commitments may be locally resolved or checked but fail to remain identifiable as the same operational units throughout the generation lifecycle. To address this, we propose SCOPE, aspecification-guided skill orchestrationframework that maintainssemantic commitmentsin an evolving structured specification and conditionally invokes retrieval, reasoning, andrepair skillsaround unresolved or violated commitments. To evaluate commitment-level intent realization, we introduceGen-Arena, a human-annotated benchmark with entity- and constraint-level specifications, together withEntity-Gated Intent Pass Rate(EGIP), a strict entity-first pass criterion. SCOPE substantially outperforms all evaluated baselines onGen-Arena, achieving 0.60 EGIP, and further achieves strong results onWISE-V(0.907) andMindBench(0.61), demonstrating the effectiveness of persistent commitment tracking for complex image generation.
View arXiv pageView PDFProject pageGitHub3Add to collection
Get this paper in your agent:
hf papers read 2605\.08043
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.08043 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.08043 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.08043 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks
SCOPE is a self-play framework for open-ended tasks that co-evolves a Challenger and Solver policy, achieving up to +10.4 points on benchmarks without external supervision.
GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation
GenEvolve is a self-evolving image generation framework that uses tool-orchestrated trajectories and visual experience distillation to iteratively improve generative capabilities, achieving state-of-the-art performance.
SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models
SCOPE introduces a method for precise action response in FPS games by conditioning transformer blocks in video diffusion models to separate in-scope effects from out-of-scope visual effects without segmentation labels, and presents CrossFPS, a multi-game dataset enabling zero-shot cross-game transfer.
Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation
This paper introduces INSET, a unified multimodal model that embeds images as native vocabulary within textual instructions to improve handling of complex interleaved inputs for image generation and editing.
SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing
SmartPhotoCrafter introduces an automatic photographic image editing pipeline that unifies quality comprehension and enhancement without explicit human instructions, outperforming existing generative models on photo-realistic enhancement tasks.