@_philschmid: Gemini Embedding 2 now GA! One embedding model that understand text, images, video, audio, and PDFs! 5 modalities in a …

X AI KOLs Following 04/22/26, 06:11 PM Models

Summary

Google releases Gemini Embedding 2 for general availability, offering a single model that embeds text, images, video, audio, and PDFs into one unified space across 100+ languages without needing audio transcription.

Gemini Embedding 2 now GA! One embedding model that understand text, images, video, audio, and PDFs! 5 modalities in a single unified embedding space Supports up to 8,192 input tokens, 100+ languages Embeds audio natively, no transcription step needed Flexible output

Original Article

View Cached Full Text

Cached at: 04/23/26, 05:41 AM

Similar Articles

Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

TechCrunch AI

Google announces Gemini Omni, a family of multimodal models that can generate video from images, audio, and text, reasoning across inputs to produce consistent, high-quality outputs. The first model, Gemini Omni Flash, rolls out at Google I/O to the Gemini app, YouTube Shorts, and Flow.

Introducing Gemini Omni: Create Anything from Anything

YouTube AI Channels

Google introduces Gemini Omni, a new multimodal AI model capable of processing and generating content across text, images, audio, and video from any input type.

Introducing Gemini 2.0: our new AI model for the agentic era

Google DeepMind Blog

Google DeepMind introduces Gemini 2.0, a new agentic AI model with native image and audio output, enhanced tool use, and multimodal capabilities designed for the next era of AI agents. Gemini 2.0 Flash is now available to developers with wider availability planned for early 2025.

Gemini 2.0 is now available to everyone

Google DeepMind Blog

Google announces general availability of Gemini 2.0 Flash via API, introduces experimental Gemini 2.0 Pro for advanced coding and reasoning tasks, and releases Gemini 2.0 Flash-Lite as a cost-efficient option. All models support multimodal input with text output and are available through Google AI Studio, Vertex AI, and the Gemini app.

@GoogleDeepMind: We’re dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video…

X AI KOLs

Google DeepMind announces Gemini Omni, a new model that combines Gemini's intelligence with generative media systems to create video from any input, marking a significant step in multimodal AI.