Gemma 4 QAT confirmed to release soon!

Reddit r/LocalLLaMA 06/04/26, 09:18 AM Models

gemma quantization google local-llm qat open-source

Summary

A Google Gemma team member has confirmed that Gemma 4 QAT (Quantization-Aware Training) models will be releasing soon, suggesting users wait before testing their own quantizations.

It seems like this comment has gone widely unnoticed. [https://old.reddit.com/r/LocalLLaMA/comments/1tvtn6m/googlegemma412b\_hugging\_face/opjj681/](https://old.reddit.com/r/LocalLLaMA/comments/1tvtn6m/googlegemma412b_hugging_face/opjj681/) Maybe hold off on testing quantization and wait for it's refinements. The account is Omar from the gemma team.

Original Article

Similar Articles

Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency

Hacker News Top

Google releases Gemma 4 models optimized with Quantization-Aware Training (QAT) to improve efficiency for mobile and laptop deployment, reducing memory footprint to 1GB for the E2B model while preserving quality.

google/gemma-4-12B-it-qat-q4_0-gguf

Hugging Face Models Trending

Google DeepMind releases Gemma 4 models optimized with Quantization-Aware Training (QAT) in multiple formats including GGUF, enabling high quality with reduced memory requirements.

@_philschmid: Weights: https://huggingface.co/collections/google/gemma-4-qat-q4-0… Blog: https://blog.google/innovation-and-ai/techno…

X AI KOLs Following

Google released Gemma 4 models with quantization-aware training (QAT) at Q4_0 precision on Hugging Face, offering efficient variants from 5B to 33B parameters.

@TheAhmadOsman: Great news Google just released the QAT (4bit) of their Gemma 4 model series including the 31B Dense and the 26B MoE An…

X AI KOLs Following

Google released QAT (4-bit) versions of their Gemma 4 model series, including the 31B Dense and 26B MoE models, furthering open-source AI.

Gemma 4 12b QAT is a regression for my use case, despite all the hype.. Not my main Squeeze

Reddit r/LocalLLaMA

The author reports that the Gemma 4 12b QAT model suffers from a regression in tool calling and coding tasks compared to the standard Q5_K_L version, due to a bug involving control token misconfiguration. Despite high token speed, the model's inconsistent outputs make it unsuitable for agent workflows.