Introducing Gemini 2.5 Flash

Google DeepMind Blog Models

Summary

Google announces Gemini 2.5 Flash, a new hybrid reasoning model available in preview through the Gemini API. The model features toggleable thinking capabilities, fine-grained thinking budgets for quality-cost-latency tradeoffs, and maintains fast inference speeds while improving performance over 2.0 Flash.

Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 08:36 AM

# Start building with Gemini 2.5 Flash Source: https://developers.googleblog.com/en/start-building-with-gemini-25-flash/?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content= **Tulsee Doshi** (https://developers.googleblog.com/en/search/?author=Tulsee+Doshi) Director of Product Management, Gemini Today we are rolling out an early version of **Gemini 2.5 Flash** in **preview** through the Gemini API via [Google AI Studio](https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-flash-preview-04-17) and [Vertex AI](https://console.cloud.google.com/vertex-ai/studio/multimodal?model=gemini-2.5-flash-preview-04-17). Building upon the popular foundation of 2.0 Flash, this new version delivers a major upgrade in reasoning capabilities, while still prioritizing speed and cost. Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off. The model also allows developers to set thinking budgets to find the right tradeoff between quality, cost, and latency. Even with **thinking off**, developers can maintain the fast speeds of 2.0 Flash, and improve performance. Our Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding. Instead of immediately generating an output, the model can perform a "thinking" process to better understand the prompt, break down complex tasks, and plan a response. On complex tasks that require multiple steps of reasoning (like solving math problems or analyzing research questions), the thinking process allows the model to arrive at more accurate and comprehensive answers. In fact, Gemini 2.5 Flash performs strongly on [Hard Prompts in LMArena](https://lmarena.ai/?leaderboard), second only to 2.5 Pro. Comparison table showing price and performance metrics for LLMs. 2.5 Flash has comparable metrics to other leading models for a fraction of the cost and size. ## Our most cost-efficient thinking model 2.5 Flash continues to lead as the model with the best price-to-performance ratio. A graph showing Gemini 2.5 Flash price-to-performance comparison. Gemini 2.5 Flash adds another model to Google's pareto frontier of cost to quality.* ## Fine-grained controls to manage thinking We know that different use cases have different tradeoffs in quality, cost, and latency. To give developers flexibility, we've enabled setting a **thinking budget** that offers fine-grained control over the maximum number of tokens a model can generate while thinking. A higher budget allows the model to reason further to improve quality. Importantly, though, the budget sets a cap on how much 2.5 Flash can think, but the model does not use the full budget if the prompt does not require it. Plot graphs show improvements in reasoning quality as thinking budget increases. Improvements in reasoning quality as thinking budget increases. The model is trained to know how long to think for a given prompt, and therefore automatically decides how much to think based on the perceived task complexity. If you want to keep the lowest cost and latency while still improving performance over 2.0 Flash, **set the thinking budget to 0.** You can also choose to **set a specific token budget** for the thinking phase using a parameter in the API or the slider in Google AI Studio and in Vertex AI. The budget can range from 0 to 24576 tokens for 2.5 Flash. The following prompts demonstrate how much reasoning may be used in the 2.5 Flash's default mode. ### Prompts requiring low reasoning: **Example 1:** "Thank you" in Spanish **Example 2:** How many provinces does Canada have? ### Prompts requiring medium reasoning: **Example 1:** You roll two dice. What's the probability they add up to 7? **Example 2:** My gym has pickup hours for basketball between 9-3pm on MWF and between 2-8pm on Tuesday and Saturday. If I work 9-6pm 5 days a week and want to play 5 hours of basketball on weekdays, create a schedule for me to make it all work. ### Prompts requiring high reasoning: **Example 1:** A cantilever beam of length L=3m has a rectangular cross-section (width b=0.1m, height h=0.2m) and is made of steel (E=200 GPa). It is subjected to a uniformly distributed load w=5 kN/m along its entire length and a point load P=10 kN at its free end. Calculate the maximum bending stress (σ_max). **Example 2:** Write a function `evaluate_cells(cells: Dict[str, str]) -> Dict[str, float]` that computes the values of spreadsheet cells. Each cell contains: - A number (e.g., `"3"`) - Or a formula like `"=A1 + B1 * 2"` using `+`, `-`, `*`, `/` and other cells. Requirements: - Resolve dependencies between cells. - Handle operator precedence (`*/` before `+-`). - Detect cycles and raise `ValueError("Cycle detected at ")`. - No `eval()`. Use only built-in libraries. ## Start building with Gemini 2.5 Flash today Gemini 2.5 Flash with thinking capabilities is now available in preview via the [Gemini API](https://ai.google.dev/gemini-api/docs/thinking) in [Google AI Studio](https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-flash-preview-04-17) and in [Vertex AI](https://console.cloud.google.com/vertex-ai/studio/multimodal?model=gemini-2.5-flash-preview-04-17), and in a dedicated dropdown in the [Gemini app](http://gemini.google.com/). We encourage you to experiment with the `thinking_budget` parameter and explore how controllable reasoning can help you solve more complex problems. ```python from google import genai client = genai.Client(api_key="GEMINI_API_KEY") response = client.models.generate_content( model="gemini-2.5-flash-preview-04-17", contents="You roll two dice. What's the probability they add up to 7?", config=genai.types.GenerateContentConfig( thinking_config=genai.types.ThinkingConfig( thinking_budget=1024 ) ) ) print(response.text) ``` Find detailed API references and thinking guides in our [developer docs](https://ai.google.dev/gemini-api/docs/thinking#set-budget) or get started with [code examples](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Get_started_thinking.ipynb) from the [Gemini Cookbook](https://github.com/google-gemini/cookbook/). We will continue to improve Gemini 2.5 Flash, with more coming soon, before we make it generally available for full production use. *Model pricing is sourced from Artificial Analysis & Company Documentation

Similar Articles

Gemini 2.5: Updates to our family of thinking models

Google DeepMind Blog

Google announces stable general availability of Gemini 2.5 Pro and Flash models, introduces new Gemini 2.5 Flash-Lite in preview with lower latency and cost, and updates pricing for the Flash family with adjusted input/output token rates.

Gemini 2.5 Flash-Lite is now ready for scaled production use

Google DeepMind Blog

Google releases Gemini 2.5 Flash-Lite as stable and generally available, the fastest and lowest-cost model in the Gemini 2.5 family at $0.10 input/$0.40 output per 1M tokens, featuring native reasoning capabilities and full feature parity with native tools.

We're expanding our Gemini 2.5 family of models

Google DeepMind Blog

Google announces general availability of Gemini 2.5 Flash and Pro models, and introduces Gemini 2.5 Flash-Lite in preview—a new cost-efficient and fastest variant optimized for high-volume, latency-sensitive tasks.

Gemini 3 Flash: frontier intelligence built for speed

Google DeepMind Blog

Google has released Gemini 3 Flash, a fast, cost-effective AI model that combines Pro-grade reasoning with Flash-level speed for tasks like coding, complex analysis, and agentic workflows.