Experiment with Gemini 2.0 Flash native image generation

Google DeepMind Blog Products

Summary

Google expands Gemini 2.0 Flash native image generation capabilities to all developers, enabling multimodal text and image output for storytelling, conversational image editing, and applications requiring world understanding and text rendering.

Native image output is available in Gemini 2.0 Flash for developers to experiment with in Google AI Studio and the Gemini API.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 08:36 AM

# Experiment with Gemini 2.0 Flash native image generation Source: https://developers.googleblog.com/en/experiment-with-gemini-20-flash-native-image-generation/ In December (https://developers.googleblog.com/en/the-next-chapter-of-the-gemini-era-for-developers/) we first introduced native image output in Gemini 2.0 Flash to trusted testers. Today, we're making it available for developer experimentation across all regions (https://ai.google.dev/gemini-api/docs/available-regions) currently supported by Google AI Studio. You can test this new capability using an experimental version of Gemini 2.0 Flash (gemini-2.0-flash-exp (https://aistudio.google.com/prompts/new_chat?model=gemini-2.0-flash-exp)) in Google AI Studio and via the Gemini API. Gemini 2.0 Flash combines multimodal input, enhanced reasoning, and natural language understanding to create images. Here are some examples of where 2.0 Flash's multimodal outputs shine: ### **1. Text and images together** Use Gemini 2.0 Flash to tell a story and it will illustrate it with pictures, keeping the characters and settings consistent throughout. Give it feedback and the model will retell the story or change the style of its drawings. Sorry, your browser doesn't support playback for this video Story and illustration generation in Google AI Studio ### **2. Conversational image editing** Gemini 2.0 Flash helps you edit images through many turns of a natural language dialogue, great for iterating towards a perfect image, or to explore different ideas together. Sorry, your browser doesn't support playback for this video Multi-turn conversation image editing maintaining context throughout the conversation in Google AI Studio ### **3. World understanding** Unlike many other image generation models, Gemini 2.0 Flash leverages world knowledge and enhanced reasoning to create the *right* image. This makes it perfect for creating detailed imagery that's realistic–like illustrating a recipe. While it strives for accuracy, like all language models, its knowledge is broad and general, not absolute or complete. Sorry, your browser doesn't support playback for this video Interleaved text and image output for a recipe in Google AI Studio ### **4. Text rendering** Most image generation models struggle to accurately render long sequences of text, often resulting in poorly formatted or illegible characters, or misspellings. Internal benchmarks show that 2.0 Flash has stronger rendering compared to leading competitive models, and great for creating advertisements, social posts, or even invitations. Sorry, your browser doesn't support playback for this video Image outputs with long text rendering in Google AI Studio ## Start making images with Gemini today Get started with Gemini 2.0 Flash via the Gemini API. Read more about image generation in our docs (https://ai.google.dev/gemini-api/docs/image-generation). ``` from google import genai from google.genai import types client = genai.Client(api_key="GEMINI_API_KEY") response = client.models.generate_content( model="gemini-2.0-flash-exp", contents=( "Generate a story about a cute baby turtle in a 3d digital art style. " "For each scene, generate an image." ), config=types.GenerateContentConfig( response_modalities=["Text", "Image"] ), ) ``` Python Copied Whether you are building AI agents, developing apps with beautiful visuals like illustrated interactive stories, or brainstorming visual ideas in conversation, Gemini 2.0 Flash allows you to add text and image generation with just a single model. We're eager to see what developers create with native image output and your feedback (https://discuss.ai.google.dev/c/gemini-api/4) will help us finalize a production-ready version soon.

Similar Articles

Gemini 2.0 is now available to everyone

Google DeepMind Blog

Google announces general availability of Gemini 2.0 Flash via API, introduces experimental Gemini 2.0 Pro for advanced coding and reasoning tasks, and releases Gemini 2.0 Flash-Lite as a cost-efficient option. All models support multimodal input with text output and are available through Google AI Studio, Vertex AI, and the Gemini app.

Start building with Gemini 2.0 Flash and Flash-Lite

Google DeepMind Blog

Google announces general availability of Gemini 2.0 Flash-Lite with improved performance over 1.5 Flash, simplified pricing, and a 1 million token context window. The model is now available in Google AI Studio and Vertex AI for production use, with developers already building voice AI, data analytics, and video editing applications.

Improved Gemini audio models for powerful voice experiences

Google DeepMind Blog

Google has updated Gemini 2.5 Flash Native Audio to improve live voice agent capabilities, including sharper function calling, better instruction following, and smoother conversation context retrieval. The update also introduces live speech translation in the Google Translate app beta, preserving intonation across 70+ languages.

Introducing Gemini 2.5 Flash

Google DeepMind Blog

Google announces Gemini 2.5 Flash, a new hybrid reasoning model available in preview through the Gemini API. The model features toggleable thinking capabilities, fine-grained thinking budgets for quality-cost-latency tradeoffs, and maintains fast inference speeds while improving performance over 2.0 Flash.

Gemini 2.5 Flash-Lite is now ready for scaled production use

Google DeepMind Blog

Google releases Gemini 2.5 Flash-Lite as stable and generally available, the fastest and lowest-cost model in the Gemini 2.5 family at $0.10 input/$0.40 output per 1M tokens, featuring native reasoning capabilities and full feature parity with native tools.