Google releases Gemma 4 12B, a compact AI model optimized for local laptop use with only 16GB of RAM, featuring multi-token prediction and streamlined multimodal capabilities for text, audio, and images.
<p>The generative AI boom has driven the cost of memory into the stratosphere, and Google is a key part of that trend. So it's only fitting that Google should offer some less RAM-hungry local AI models. The company has announced the release of a <a href="https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12B/">new Gemma 4 model</a> that fills a gap in the lineup that launched earlier this year. The new model is efficient enough that you may be able to run it on a pretty average consumer laptop.</p>
<p>In April, Google <a href="https://arstechnica.com/ai/2026/04/google-announces-gemma-4-open-ai-models-switches-to-apache-2-0-license/">released four models in the Gemma 4 family</a>, which also marked the shift to a more open Apache 2.0 license. The initial models included two mobile-optimized options (E2B and E4B) along with a pair of models for more serious work (26B Mixture of Experts and 31B Dense). That left a rather large unserved space in the middle, which is right where the new model falls.</p>
<p>Gemma 4 12B is considerably more capable than the mobile versions, but it won't require a $20,000 AI accelerator to run locally. Google says Gemma 4 12B is unique in that it can run on many consumer laptops without sacrificing quality. As long as you've got a computer with 16GB of system RAM or VRAM, the 12-billion-parameter model will work. That's about half the total memory footprint of Gemma 4 26B MoE, and Google claims the new model is almost as capable, at least as far as benchmarks go.</p><p><a href="https://arstechnica.com/google/2026/06/googles-new-gemma-4-open-ai-model-is-sized-for-your-laptop/">Read full article</a></p>
<p><a href="https://arstechnica.com/google/2026/06/googles-new-gemma-4-open-ai-model-is-sized-for-your-laptop/#comments">Comments</a></p>
# Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM
Source: [https://arstechnica.com/google/2026/06/googles-new-gemma-4-open-ai-model-is-sized-for-your-laptop/](https://arstechnica.com/google/2026/06/googles-new-gemma-4-open-ai-model-is-sized-for-your-laptop/)
[](https://cdn.arstechnica.net/wp-content/uploads/2026/06/1920x1080_xMVEyWv.width-1000.format-webp.png)
Gemma 4 12B is almost as capable as the version with 26 billion parameters\.
Credit: Google
Gemma 4 12B is almost as capable as the version with 26 billion parameters\.Credit: Google
Google says the new model is capable of complex multistep reasoning and agentic workflows that previously required the larger Gemma variants\. Despite the smaller parameter count, Gemma 4 12B comes with the newly devised[Multi\-Token Prediction \(MTP\) drafters](https://arstechnica.com/ai/2026/05/googles-gemma-4-open-ai-models-use-speculative-decoding-to-get-up-to-3x-faster/), which take advantage of unused processing cycles to calculate possible future tokens\. The result is greater speed and efficiency\. Google has released optional MTP versions of the other Gemma 4 models, but this is the first one to have MTP out of the box\.
Gemma 4 12B is also more efficient thanks to a new approach to multimodality\. The Gemma 4 family is natively multimodal, accepting text, audio, or images as inputs\. Most gen AI models—including the other Gemma 4 variants—use dedicated encoders to process non\-text inputs and pass that data to the LLM\. This works well enough, but it increases latency and memory usage\.
With the new mid\-weight model, Google has implemented a streamlined embedding module for vision, featuring single\-matrix multiplication and positional embedding, which allows the data to pass to the LLM with proper spatial awareness\. This eliminates the need for a bulky middleman encoder\. For audio, there’s no encoding at all\. The developers worked out a method of projecting the raw audio signal into the same vectors used for text tokens\.
Gemma 4 12B Demo
If you want to check out the new Gemma 4 model, it’s accessible without a download via tools like[LM Studio](https://lmstudio.ai/models/gemma-4),[Google AI Edge Gallery](https://developers.google.com/edge/gallery), and more\. But the whole idea with Gemma 4 12B is that you can run it locally and on your own terms\. If you’ve got the RAM, the model weights are available for download immediately on[Kaggle](https://huggingface.co/collections/google/gemma-4)and[Hugging Face](https://huggingface.co/collections/google/gemma-4)\. It’s just shy of 18GB\.
Google released Gemma 4 12B, an open-source multimodal AI model under Apache 2.0 that runs locally on laptops with 16GB RAM, targeting enterprise edge deployment.
Gemma 4 12B, Google's multimodal open model supporting image, audio, and 256K context, can now run locally on just 8GB RAM via Unsloth's Dynamic GGUFs, enabling local training and inference through Unsloth Studio.
Google's new Gemma 4 12B model claims near-26B performance. In a local test on RTX 4090, the 26B-A4B model was faster and better but the 12B used less VRAM, making it suitable for laptops.
Google introduces Gemma 3, a collection of lightweight open models (1B, 4B, 12B, 27B) designed to run on single GPUs or TPUs, featuring support for 140+ languages, 128k context window, and multimodal capabilities. The models outperform larger competitors like Llama 3 and DeepSeek-V3 while maintaining efficiency for on-device deployment.