@mtschannen: For the past years my research focus was on unifying models and training paradigms across modalities. Today I'm excited…

X AI KOLs Timeline 06/03/26, 06:13 PM Models

gemma-4 multimodal encoder-free dense-model model-release google-deepmind

Summary

Google DeepMind researcher announces the release of Gemma 4 12B, a dense encoder-free model that processes text, image, and audio inputs, continuing work on unifying models across modalities.

For the past years my research focus was on unifying models and training paradigms across modalities. Today I'm excited that we're releasing our latest model aligned with this theme: Gemma 4 12B, a dense encoder-free model which processes raw text, image, and audio inputs! 1/ https://t.co/4J2JKCtzU5

Original Article

View Cached Full Text

Cached at: 06/03/26, 09:55 PM

For the past years my research focus was on unifying models and training paradigms across modalities. Today I’m excited that we’re releasing our latest model aligned with this theme:

Gemma 4 12B, a dense encoder-free model which processes raw text, image, and audio inputs!

Despite being encoder-free, Gemma 4 12B nicely sits on the Pareto frontier of the Gemma 4 family. And in contrast to many other open-weight encoder-free models which focus on vision-language tasks, Gemma 4 12B also shows strong performance on text-focused and agentic tasks.

Even more importantly, Gemma 4 12B nicely fits on GPU laptops with 16GB VRAM, so it’s ideal to build local multimodal applications.

Find out more:

The Keyword: https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12B/…
Dev Blog: https://developers.googleblog.com/gemma-4-12b-the-developer-guide/…
Visual Guide: https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-gemma-4-12b…

3/3

@mtschannen: For the past years my research focus was on unifying models and training paradigms across modalities. Today I'm excited…

Similar Articles

@googleaidevs: We’re launching Gemma 4 12B: Our unified, encoder-free model that brings powerful multimodal intelligence straight to y…

google/gemma-4-31B-it-assistant

google/gemma-4-26B-A4B-it

Welcome Gemma 4: Frontier multimodal intelligence on device

Gemma 4: Byte for byte, the most capable open models

Submit Feedback

Similar Articles

@googleaidevs: We’re launching Gemma 4 12B: Our unified, encoder-free model that brings powerful multimodal intelligence straight to y…
Google launches Gemma 4 12B, an encoder-free multimodal model with native audio support, optimized for local execution on laptops under Apache 2.0.

google/gemma-4-31B-it-assistant

Welcome Gemma 4: Frontier multimodal intelligence on device

Gemma 4: Byte for byte, the most capable open models