@_philschmid: Gemini Embedding 2 now GA! One embedding model that understand text, images, video, audio, and PDFs! 5 modalities in a …
Summary
Google releases Gemini Embedding 2 for general availability, offering a single model that embeds text, images, video, audio, and PDFs into one unified space across 100+ languages without needing audio transcription.
View Cached Full Text
Cached at: 04/23/26, 05:41 AM
Gemini Embedding 2 now GA! One embedding model that understand text, images, video, audio, and PDFs! 5 modalities in a single unified embedding space Supports up to 8,192 input tokens, 100+ languages Embeds audio natively, no transcription step needed Flexible output
Similar Articles
Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start
Google announces Gemini Omni, a family of multimodal models that can generate video from images, audio, and text, reasoning across inputs to produce consistent, high-quality outputs. The first model, Gemini Omni Flash, rolls out at Google I/O to the Gemini app, YouTube Shorts, and Flow.
Introducing Gemini Omni: Create Anything from Anything
Google introduces Gemini Omni, a new multimodal AI model capable of processing and generating content across text, images, audio, and video from any input type.
Introducing Gemini 2.0: our new AI model for the agentic era
Google DeepMind introduces Gemini 2.0, a new agentic AI model with native image and audio output, enhanced tool use, and multimodal capabilities designed for the next era of AI agents. Gemini 2.0 Flash is now available to developers with wider availability planned for early 2025.
Gemini 2.0 is now available to everyone
Google announces general availability of Gemini 2.0 Flash via API, introduces experimental Gemini 2.0 Pro for advanced coding and reasoning tasks, and releases Gemini 2.0 Flash-Lite as a cost-efficient option. All models support multimodal input with text output and are available through Google AI Studio, Vertex AI, and the Gemini app.
@GoogleDeepMind: We’re dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video…
Google DeepMind announces Gemini Omni, a new model that combines Gemini's intelligence with generative media systems to create video from any input, marking a significant step in multimodal AI.