GenClaw: Code-Driven Agentic Image Generation

Hugging Face Daily Papers Papers

Summary

GenClaw introduces a code-driven agentic image generation framework that breaks the black-box paradigm by mimicking the human creative process: conceptualizing, sketching with code (SVG/HTML/Three.js), and then using generative models for texture and photorealism.

Image generation models have evolved from text-conditioned pixel synthesis toward multimodal agents endowed with visual comprehension and tool invocation capabilities. Yet, existing agents remain at the mercy of underlying black-box image models. Their workflow is trapped in a repetitive cycle of prompt rewriting for generation refinement, leaving them with no mechanism to directly manipulate the canvas. In essence, the potential of LLMs to serve as a genuine "brush" for precise visual construction remains largely untapped. In this paper, we propose GenClaw, a code-driven agentic image generation paradigm that empowers the agent to create like a human artist: first conceptualizing, then sketching, and finally coloring. Specifically, the agent first constructs the conceptual knowledge and context through search and reasoning. It then utilizes code (e.g., SVG, HTML, Three.js) to render executable visual sketches. Finally, it employs an image generation model to supplement textures, materials, and photorealism. In this workflow, code serves as a controllable intermediate canvas bridging linguistic reasoning and pixel synthesis, seamlessly integrating programmatic logic with the visual expressiveness of generative models. By transforming image generation from a black-box paradigm into a staged process akin to authentic human creation, GenClaw offers a step toward for highly controllable and interpretable visual generation systems.
Original Article
View Cached Full Text

Cached at: 05/29/26, 07:01 AM

Paper page - GenClaw: Code-Driven Agentic Image Generation

Source: https://huggingface.co/papers/2605.30248

Abstract

GenClaw presents a code-driven agentic image generation framework that enables precise visual construction through conceptualization, sketching, and coloring stages, integrating programmatic logic with generative models.

Image generation modelshave evolved from text-conditioned pixel synthesis towardmultimodal agentsendowed withvisual comprehensionandtool invocationcapabilities. Yet, existing agents remain at the mercy of underlying black-box image models. Their workflow is trapped in a repetitive cycle of prompt rewriting for generation refinement, leaving them with no mechanism to directly manipulate the canvas. In essence, the potential ofLLMsto serve as a genuine “brush” for precise visual construction remains largely untapped. In this paper, we propose GenClaw, a code-driven agentic image generation paradigm that empowers the agent to create like a human artist: first conceptualizing, then sketching, and finally coloring. Specifically, the agent first constructs theconceptual knowledgeand context through search and reasoning. It then utilizes code (e.g., SVG, HTML, Three.js) to render executablevisual sketches. Finally, it employs an image generation model to supplement textures, materials, and photorealism. In this workflow, code serves as a controllable intermediate canvas bridging linguistic reasoning and pixel synthesis, seamlessly integrating programmatic logic with the visual expressiveness ofgenerative models. By transforming image generation from a black-box paradigm into a staged process akin to authentic human creation, GenClaw offers a step toward for highly controllable and interpretablevisual generation systems.

View arXiv pageView PDFGitHub8Add to collection

Get this paper in your agent:

hf papers read 2605\.30248

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.30248 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.30248 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.30248 in a Space README.md to link it from this page.

Collections including this paper1

Similar Articles

Alien Dreams: An Emerging Art Scene

ML at Berkeley

The article highlights the emerging scene of AI-generated art using OpenAI's CLIP model as a steering mechanism for generative models, showcasing various examples of text-to-image outputs.