@osanseviero: Super excited to introduce Gemma 4 12B! - Multimodal: audio, image, video, and text input - Novel architecture: we remo…

X AI KOLs Timeline 06/03/26, 04:10 PM Models

Summary

Introducing Gemma 4 12B, a multimodal model supporting audio, image, video, and text input with a novel unified architecture and a new MacOS desktop app powered by LiteRT.

Super excited to introduce Gemma 4 12B! 💎 - Multimodal: audio, image, video, and text input - Novel architecture: we removed the multimodal encoders for a unified, streamlined arch - New MacOS desktop app powered by LiteRT - MTP support Excited to see what you build with it! https://t.co/De5id2XQfz

Original Article

View Cached Full Text

Cached at: 06/03/26, 05:52 PM

Super excited to introduce Gemma 4 12B! 💎

Multimodal: audio, image, video, and text input
Novel architecture: we removed the multimodal encoders for a unified, streamlined arch
New MacOS desktop app powered by LiteRT
MTP support

Excited to see what you build with it! https://t.co/De5id2XQfz

Similar Articles

@_philschmid: We just launched a Gemma 4 12B! Our first mid-sized model with native audio inputs. Gemma 4 12 B is a unified, encoder-…

X AI KOLs Following

We just launched Gemma 4 12B, a mid-sized multimodal model with native audio inputs, requiring only 16GB memory and released under Apache 2.0.

@googleaidevs: We’re launching Gemma 4 12B: Our unified, encoder-free model that brings powerful multimodal intelligence straight to y…

X AI KOLs Timeline

Google launches Gemma 4 12B, an encoder-free multimodal model with native audio support, optimized for local execution on laptops under Apache 2.0.

Google’s Gemma 4 12B just dropped - here’s how to run it locally on your Mac

Reddit r/artificial

Google released Gemma 4 12B, an Apache 2.0 open-source multimodal model supporting text, vision, and audio with a 256K context window. The article provides a guide for running it locally on Macs using Ollama, LM Studio, or llama.cpp.

@RedHat_AI: Gemma 4 12B dropped today. Apache 2.0, multimodal: text, image, audio, and video. 256K context, built-in thinking, nati…

X AI KOLs Following

Gemma 4 12B has been released under Apache 2.0, supporting multimodal inputs (text, image, audio, video), 256K context, built-in thinking, and native tool calling, running on Red Hat OpenShift AI.

@mtschannen: For the past years my research focus was on unifying models and training paradigms across modalities. Today I'm excited…