Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM

Ars Technica Models

Summary

Google releases Gemma 4 12B, a compact AI model optimized for local laptop use with only 16GB of RAM, featuring multi-token prediction and streamlined multimodal capabilities for text, audio, and images.

<p>The generative AI boom has driven the cost of memory into the stratosphere, and Google is a key part of that trend. So it's only fitting that Google should offer some less RAM-hungry local AI models. The company has announced the release of a <a href="https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12B/">new Gemma 4 model</a> that fills a gap in the lineup that launched earlier this year. The new model is efficient enough that you may be able to run it on a pretty average consumer laptop.</p> <p>In April, Google <a href="https://arstechnica.com/ai/2026/04/google-announces-gemma-4-open-ai-models-switches-to-apache-2-0-license/">released four models in the Gemma 4 family</a>, which also marked the shift to a more open Apache 2.0 license. The initial models included two mobile-optimized options (E2B and E4B) along with a pair of models for more serious work (26B Mixture of Experts and 31B Dense). That left a rather large unserved space in the middle, which is right where the new model falls.</p> <p>Gemma 4 12B is considerably more capable than the mobile versions, but it won't require a $20,000 AI accelerator to run locally. Google says Gemma 4 12B is unique in that it can run on many consumer laptops without sacrificing quality. As long as you've got a computer with 16GB of system RAM or VRAM, the 12-billion-parameter model will work. That's about half the total memory footprint of Gemma 4 26B MoE, and Google claims the new model is almost as capable, at least as far as benchmarks go.</p><p><a href="https://arstechnica.com/google/2026/06/googles-new-gemma-4-open-ai-model-is-sized-for-your-laptop/">Read full article</a></p> <p><a href="https://arstechnica.com/google/2026/06/googles-new-gemma-4-open-ai-model-is-sized-for-your-laptop/#comments">Comments</a></p>
Original Article
View Cached Full Text

Cached at: 06/03/26, 09:41 PM

# Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM Source: [https://arstechnica.com/google/2026/06/googles-new-gemma-4-open-ai-model-is-sized-for-your-laptop/](https://arstechnica.com/google/2026/06/googles-new-gemma-4-open-ai-model-is-sized-for-your-laptop/) [![Gemma 4 benchmark graph](https://cdn.arstechnica.net/wp-content/uploads/2026/06/1920x1080_xMVEyWv.width-1000.format-webp.png)](https://cdn.arstechnica.net/wp-content/uploads/2026/06/1920x1080_xMVEyWv.width-1000.format-webp.png) Gemma 4 12B is almost as capable as the version with 26 billion parameters\. Credit: Google Gemma 4 12B is almost as capable as the version with 26 billion parameters\.Credit: Google Google says the new model is capable of complex multistep reasoning and agentic workflows that previously required the larger Gemma variants\. Despite the smaller parameter count, Gemma 4 12B comes with the newly devised[Multi\-Token Prediction \(MTP\) drafters](https://arstechnica.com/ai/2026/05/googles-gemma-4-open-ai-models-use-speculative-decoding-to-get-up-to-3x-faster/), which take advantage of unused processing cycles to calculate possible future tokens\. The result is greater speed and efficiency\. Google has released optional MTP versions of the other Gemma 4 models, but this is the first one to have MTP out of the box\. Gemma 4 12B is also more efficient thanks to a new approach to multimodality\. The Gemma 4 family is natively multimodal, accepting text, audio, or images as inputs\. Most gen AI models—including the other Gemma 4 variants—use dedicated encoders to process non\-text inputs and pass that data to the LLM\. This works well enough, but it increases latency and memory usage\. With the new mid\-weight model, Google has implemented a streamlined embedding module for vision, featuring single\-matrix multiplication and positional embedding, which allows the data to pass to the LLM with proper spatial awareness\. This eliminates the need for a bulky middleman encoder\. For audio, there’s no encoding at all\. The developers worked out a method of projecting the raw audio signal into the same vectors used for text tokens\. Gemma 4 12B Demo If you want to check out the new Gemma 4 model, it’s accessible without a download via tools like[LM Studio](https://lmstudio.ai/models/gemma-4),[Google AI Edge Gallery](https://developers.google.com/edge/gallery), and more\. But the whole idea with Gemma 4 12B is that you can run it locally and on your own terms\. If you’ve got the RAM, the model weights are available for download immediately on[Kaggle](https://huggingface.co/collections/google/gemma-4)and[Hugging Face](https://huggingface.co/collections/google/gemma-4)\. It’s just shy of 18GB\.

Similar Articles

Introducing Gemma 3

Google DeepMind Blog

Google introduces Gemma 3, a collection of lightweight open models (1B, 4B, 12B, 27B) designed to run on single GPUs or TPUs, featuring support for 140+ languages, 128k context window, and multimodal capabilities. The models outperform larger competitors like Llama 3 and DeepSeek-V3 while maintaining efficiency for on-device deployment.