Google announces general availability of Gemini 2.5 Flash and Pro models, and introduces Gemini 2.5 Flash-Lite in preview—a new cost-efficient and fastest variant optimized for high-volume, latency-sensitive tasks.
Gemini 2.5 Flash and Pro are now generally available, and we're introducing 2.5 Flash-Lite, our most cost-efficient and fastest 2.5 model yet.
# We're expanding our Gemini 2.5 family of models
Source: https://blog.google/products-and-platforms/products/gemini/gemini-2-5-model-family-expands/
Gemini 2\.5 Flash and Pro are now generally available, and we're introducing 2\.5 Flash\-Lite, our most cost\-efficient and fastest 2\.5 model yet\.
Blue and black futuristic illustration with Gemini 2.5 logo in the middle
We designed Gemini 2\.5 to be a family of hybrid reasoning models that provide amazing performance, while also being at the Pareto Frontier (https://storage.googleapis.com/deepmind-media/gemini/gemini_v2_5_report.pdf#page=3) of cost and speed\. Today, we're taking the next step with our 2\.5 Pro and Flash models by releasing them as stable and generally available\. And we're bringing you 2\.5 Flash\-Lite in preview — our most cost\-efficient and fastest 2\.5 model yet\.
## Making 2\.5 Flash and 2\.5 Pro generally available
Thanks to all of your feedback, today we're releasing stable versions of 2\.5 Flash and Pro, so you can build production applications with confidence\. Developers (https://developers.googleblog.com/en/gemini-2-5-thinking-model-updates/) like Spline and Rooms and organizations (https://cloud.google.com/blog/products/ai-machine-learning/gemini-2-5-flash-lite-flash-pro-ga-vertex-ai) like Snap and SmartBear have already been using the latest versions in\-production for the last few weeks\.
## Introducing Gemini 2\.5 Flash\-Lite
We're also introducing a preview of the new Gemini 2\.5 Flash\-Lite, our most cost\-efficient and fastest 2\.5 model yet\. You can start building with the preview version now, and we're looking forward to your feedback\.
2\.5 Flash\-Lite has all\-around higher quality than 2\.0 Flash\-Lite on coding, math, science, reasoning and multimodal benchmarks\. It excels at high\-volume, latency\-sensitive tasks like translation and classification, with lower latency than 2\.0 Flash\-Lite and 2\.0 Flash on a broad sample of prompts\. It comes with the same capabilities that make Gemini 2\.5 helpful, including the ability to turn thinking on at different budgets, connecting to tools like Google Search and code execution, multimodal input, and a 1 million\-token context length\.
See more details about our 2\.5 family of models in the latest Gemini technical report (https://storage.googleapis.com/deepmind-media/gemini/gemini_v2_5_report.pdf)\.
Gemini 2.5 Flash-Lite benchmarks table
The preview of Gemini 2\.5 Flash\-Lite is now available in Google AI Studio and Vertex AI, alongside the stable versions of 2\.5 Flash and Pro\. Both 2\.5 Flash and Pro are also accessible in the Gemini app\. We've also brought custom versions of 2\.5 Flash\-Lite and Flash to Search\.
We can't wait to see what you continue to build with Gemini 2\.5\.
Google announces stable general availability of Gemini 2.5 Pro and Flash models, introduces new Gemini 2.5 Flash-Lite in preview with lower latency and cost, and updates pricing for the Flash family with adjusted input/output token rates.
Google releases Gemini 2.5 Flash-Lite as stable and generally available, the fastest and lowest-cost model in the Gemini 2.5 family at $0.10 input/$0.40 output per 1M tokens, featuring native reasoning capabilities and full feature parity with native tools.
Google announces general availability of Gemini 2.0 Flash via API, introduces experimental Gemini 2.0 Pro for advanced coding and reasoning tasks, and releases Gemini 2.0 Flash-Lite as a cost-efficient option. All models support multimodal input with text output and are available through Google AI Studio, Vertex AI, and the Gemini app.
Google introduces Gemini 3.1 Flash-Lite, a high-speed, cost-efficient AI model available in preview via Google AI Studio and Vertex API, designed for high-volume developer workloads.
Google announces Gemini 2.5 Flash, a new hybrid reasoning model available in preview through the Gemini API. The model features toggleable thinking capabilities, fine-grained thinking budgets for quality-cost-latency tradeoffs, and maintains fast inference speeds while improving performance over 2.0 Flash.