@_philschmid: Gemini Embedding 2 now GA! One embedding model that understand text, images, video, audio, and PDFs! 5 modalities in a …
Summary
Google releases Gemini Embedding 2 for general availability, offering a single model that embeds text, images, video, audio, and PDFs into one unified space across 100+ languages without needing audio transcription.
View Cached Full Text
Cached at: 04/23/26, 05:41 AM
Gemini Embedding 2 now GA! One embedding model that understand text, images, video, audio, and PDFs! 5 modalities in a single unified embedding space Supports up to 8,192 input tokens, 100+ languages Embeds audio natively, no transcription step needed Flexible output
Similar Articles
Introducing Gemini 2.0: our new AI model for the agentic era
Google DeepMind introduces Gemini 2.0, a new agentic AI model with native image and audio output, enhanced tool use, and multimodal capabilities designed for the next era of AI agents. Gemini 2.0 Flash is now available to developers with wider availability planned for early 2025.
Gemini 2.0 is now available to everyone
Google announces general availability of Gemini 2.0 Flash via API, introduces experimental Gemini 2.0 Pro for advanced coding and reasoning tasks, and releases Gemini 2.0 Flash-Lite as a cost-efficient option. All models support multimodal input with text output and are available through Google AI Studio, Vertex AI, and the Gemini app.
Experiment with Gemini 2.0 Flash native image generation
Google expands Gemini 2.0 Flash native image generation capabilities to all developers, enabling multimodal text and image output for storytelling, conversational image editing, and applications requiring world understanding and text rendering.
Gemini API File Search is now multimodal
Google has expanded the Gemini API File Search tool to support multimodal data, enabling developers to build more efficient and verifiable retrieval-augmented generation (RAG) systems with features like custom metadata filtering and page citations.
Advanced audio dialog and generation with Gemini 2.5
Google announces Gemini 2.5's advanced native audio capabilities, enabling real-time conversational AI with natural speech generation, style control, and multimodal understanding across 24+ languages.