New Google Gemma 4 12B Claims Near-26B Performance - We Tested Both!

Reddit r/LocalLLaMA Models

Summary

Google's new Gemma 4 12B model claims near-26B performance. In a local test on RTX 4090, the 26B-A4B model was faster and better but the 12B used less VRAM, making it suitable for laptops.

We ran both models locally on one RTX 4090 and gave each the same task: write a self-contained HTML5 canvas animation with real physics in one file without libraries. Three scenes - a Galton board, two blocks colliding off a wall, and a chaotic triple pendulum Outputs: Gemma 4 26B-A4B: 15 GB VRAM usage, 6.9k tokens, 138 tok/s Gemma 4 12B: 9 GB VRAM usage, 8.9k tokens, 80 tok/s Same Gemma 4 family, but the 26B-A4B won every scene and ran \~1.7x faster - on just 4B active params. The 12B stayed very close though, on almost half the VRAM - which makes it the ideal model for a 16 GB laptop.
Original Article

Similar Articles

Introducing Gemma 3

Google DeepMind Blog

Google introduces Gemma 3, a collection of lightweight open models (1B, 4B, 12B, 27B) designed to run on single GPUs or TPUs, featuring support for 140+ languages, 128k context window, and multimodal capabilities. The models outperform larger competitors like Llama 3 and DeepSeek-V3 while maintaining efficiency for on-device deployment.