Gemini 3.1 Flash TTS
Summary
Google released Gemini 3.1 Flash TTS, a new text-to-speech model accessible via the Gemini API that supports advanced prompt-based control for detailed voice direction, accents, and speaking styles. The model enables sophisticated audio generation including multi-speaker conversations and character-specific vocal performances.
View Cached Full Text
Cached at: 04/20/26, 08:27 AM
Similar Articles
@GoogleDeepMind: Gemini 3.1 Flash TTS is our most controllable text-to-speech model yet. With new Audio Tags, you can easily direct voca…
Google DeepMind releases Gemini 3.1 Flash TTS, an advanced text-to-speech model featuring new Audio Tags that enable fine-grained control over vocal style, delivery, and pace through text commands.
Gemini 3.1 Flash Live: Making audio AI more natural and reliable
Google has released Gemini 3.1 Flash Live, a new high-quality audio model designed for more natural and reliable real-time voice interactions with improved latency and reasoning capabilities.
Improved Gemini audio models for powerful voice experiences
Google has updated Gemini 2.5 Flash Native Audio to improve live voice agent capabilities, including sharper function calling, better instruction following, and smoother conversation context retrieval. The update also introduces live speech translation in the Google Translate app beta, preserving intonation across 70+ languages.
Advanced audio dialog and generation with Gemini 2.5
Google announces Gemini 2.5's advanced native audio capabilities, enabling real-time conversational AI with natural speech generation, style control, and multimodal understanding across 24+ languages.
Gemini 3 Flash: frontier intelligence built for speed
Google has released Gemini 3 Flash, a fast, cost-effective AI model that combines Pro-grade reasoning with Flash-level speed for tasks like coding, complex analysis, and agentic workflows.