New Google Gemma 4 12B Claims Near-26B Performance - We Tested Both!

Reddit r/LocalLLaMA 06/03/26, 10:25 PM Models

Summary

Google's new Gemma 4 12B model claims near-26B performance. In a local test on RTX 4090, the 26B-A4B model was faster and better but the 12B used less VRAM, making it suitable for laptops.

We ran both models locally on one RTX 4090 and gave each the same task: write a self-contained HTML5 canvas animation with real physics in one file without libraries. Three scenes - a Galton board, two blocks colliding off a wall, and a chaotic triple pendulum Outputs: Gemma 4 26B-A4B: 15 GB VRAM usage, 6.9k tokens, 138 tok/s Gemma 4 12B: 9 GB VRAM usage, 8.9k tokens, 80 tok/s Same Gemma 4 family, but the 26B-A4B won every scene and ran \~1.7x faster - on just 4B active params. The 12B stayed very close though, on almost half the VRAM - which makes it the ideal model for a 16 GB laptop.

Original Article

Similar Articles

Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM

Ars Technica

Google releases Gemma 4 12B, a compact AI model optimized for local laptop use with only 16GB of RAM, featuring multi-token prediction and streamlined multimodal capabilities for text, audio, and images.

Ran gemma 4 12b on my 3090 yesterday and I think the local model game just changed

Reddit r/artificial

A user reports running Google's Gemma 4 12B model locally on a single RTX 3090 via GGUF quantization, finding strong performance including real 256k context, multimodal capabilities, and function calling that outperforms larger 70B models for coding tasks.

@analogalok: i just ran Google's brand new Unsloth Gemma4 12B dense GGUF on my RTX 4060 using llama.cpp + CUDA 13.2 21 tokens per se…

X AI KOLs Timeline

Google's new Gemma 4 12B is a single decoder-only transformer with encoder-free multimodal input, achieving strong benchmarks while being small enough to run locally on a budget GPU. It is released under Apache 2.0 license.

@KanikaBK: Google just dropped an AI bomb! A BILLION DOLLARS Game is on. Gemma 4 12 B runs on your laptop. 16 GB of RAM, that is a…

X AI KOLs Timeline

Google released Gemma 4 12B, an open-source multimodal AI model under Apache 2.0 that runs locally on laptops with 16GB RAM, targeting enterprise edge deployment.

Introducing Gemma 3

Google DeepMind Blog

Google introduces Gemma 3, a collection of lightweight open models (1B, 4B, 12B, 27B) designed to run on single GPUs or TPUs, featuring support for 140+ languages, 128k context window, and multimodal capabilities. The models outperform larger competitors like Llama 3 and DeepSeek-V3 while maintaining efficiency for on-device deployment.