Gemini 2.5 Flash-Lite is now ready for scaled production use

Google DeepMind Blog Models

Summary

Google releases Gemini 2.5 Flash-Lite as stable and generally available, the fastest and lowest-cost model in the Gemini 2.5 family at $0.10 input/$0.40 output per 1M tokens, featuring native reasoning capabilities and full feature parity with native tools.

Gemini 2.5 Flash-Lite, previously in preview, is now stable and generally available. This cost-efficient model provides high quality in a small size, and includes 2.5 family features like a 1 million-token context window and multimodality.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 08:35 AM

# Gemini 2.5 Flash-Lite is now stable and generally available Source: https://developers.googleblog.com/en/gemini-25-flash-lite-is-now-stable-and-generally-available/ Today, we're releasing the stable version of Gemini 2.5 Flash-Lite, our fastest and lowest cost ($0.10 input per 1M, $0.40 output per 1M) model in the Gemini 2.5 model family. We built 2.5 Flash-Lite to push the frontier of intelligence per dollar, with native reasoning capabilities that can be optionally toggled on for more demanding use cases. Building on the momentum of 2.5 Pro and 2.5 Flash, this model rounds out our set of 2.5 models that are ready for scaled production use. ## Our most cost-efficient and fastest 2.5 model yet Comparative table showing capabilities of Gemini 2.5 Flash-Lite, 2.5 Flash, and 2.5 Pro Gemini 2.5 Flash-Lite strikes a balance between performance and cost, without compromising on quality, particularly for latency-sensitive tasks like translation and classification. Here's what makes it stand out: - **Best-in-class speed:** Gemini 2.5 Flash-Lite has lower latency than both 2.0 Flash-Lite and 2.0 Flash on a broad sample of prompts. - **Cost-efficiency:** It's our lowest-cost 2.5 model yet, priced at $0.10 / 1M input tokens and $0.40 output tokens, allowing you to handle large volumes of requests affordably. We have also reduced audio input pricing by 40% from the preview launch. - **Smart and small:** It demonstrates all-around higher quality than 2.0 Flash-Lite across a wide range of benchmarks, including coding, math, science, reasoning, and multimodal understanding. - **Fully featured:** When you build with 2.5 Flash-Lite, you get access to a 1 million-token context window, controllable thinking budgets, and support for native tools like Grounding with Google Search, Code Execution, and URL Context. ## Gemini 2.5 Flash-Lite in action Since the launch of 2.5 Flash-Lite, we have already seen some incredibly successful deployments, here are some of our favorites: - **Satlyt** (https://satlyt.ai/) is building a decentralized space computing platform that will transform how satellite data is processed and utilized for real-time summarization of in-orbit telemetry, autonomous task management, and satellite-to-satellite communication parsing. **2.5 Flash-Lite's speed has enabled a 45% reduction in latency** for critical onboard diagnostics and a **30% decrease in power consumption** compared to their baseline models. - **HeyGen** (https://www.heygen.com/?sid=rewardful&via=heycok&gad_source=1&gad_campaignid=22741203521&gclid=Cj0KCQjwyvfDBhDYARIsAItzbZGTS1VpQAHrPymGNk7IWHZqfL4StqUECwxsAby79OH2xuCg4D_fGuEaArY9EALw_wcB) uses AI to create avatars for video content and leverages Gemini 2.5 Flash-Lite to automate video planning, analyze and optimize content, and **translate videos into over 180 languages**. This allows them to provide global, personalized experiences for their users. - **DocsHound** (https://docshound.com/) turns product demos into documentation by using Gemini 2.5 Flash-Lite to **process long videos and extract thousands of screenshots** with low latency. This transforms footage into comprehensive documentation and training data for AI agents much faster than traditional methods. - **Evertune** (https://www.evertune.ai/) helps brands understand how they are represented across AI models. Gemini 2.5 Flash-Lite is a game-changer for them, dramatically speeding up analysis and report generation. Its fast performance allows them to quickly scan and synthesize large volumes of model output to provide clients with **dynamic, timely insights**. You can start using 2.5 Flash-Lite by specifying "gemini-2.5-flash-lite" in your code. If you are using the preview version, you can switch to "gemini-2.5-flash-lite" which is the same underlying model. We plan to remove the preview alias of Flash-Lite on August 25th. Ready to start building? Try the stable version of Gemini 2.5 Flash-Lite now in Google AI Studio (https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-flash-lite) and Vertex AI (https://console.cloud.google.com/vertex-ai/studio/multimodal?model=gemini-2.5-flash-lite).

Similar Articles

Gemini 2.5: Updates to our family of thinking models

Google DeepMind Blog

Google announces stable general availability of Gemini 2.5 Pro and Flash models, introduces new Gemini 2.5 Flash-Lite in preview with lower latency and cost, and updates pricing for the Flash family with adjusted input/output token rates.

We're expanding our Gemini 2.5 family of models

Google DeepMind Blog

Google announces general availability of Gemini 2.5 Flash and Pro models, and introduces Gemini 2.5 Flash-Lite in preview—a new cost-efficient and fastest variant optimized for high-volume, latency-sensitive tasks.

Start building with Gemini 2.0 Flash and Flash-Lite

Google DeepMind Blog

Google announces general availability of Gemini 2.0 Flash-Lite with improved performance over 1.5 Flash, simplified pricing, and a 1 million token context window. The model is now available in Google AI Studio and Vertex AI for production use, with developers already building voice AI, data analytics, and video editing applications.

Introducing Gemini 2.5 Flash

Google DeepMind Blog

Google announces Gemini 2.5 Flash, a new hybrid reasoning model available in preview through the Gemini API. The model features toggleable thinking capabilities, fine-grained thinking budgets for quality-cost-latency tradeoffs, and maintains fast inference speeds while improving performance over 2.0 Flash.

Gemini 2.0 is now available to everyone

Google DeepMind Blog

Google announces general availability of Gemini 2.0 Flash via API, introduces experimental Gemini 2.0 Pro for advanced coding and reasoning tasks, and releases Gemini 2.0 Flash-Lite as a cost-efficient option. All models support multimodal input with text output and are available through Google AI Studio, Vertex AI, and the Gemini app.