Gemma 4 QAT confirmed to release soon!
Summary
A Google Gemma team member has confirmed that Gemma 4 QAT (Quantization-Aware Training) models will be releasing soon, suggesting users wait before testing their own quantizations.
Similar Articles
Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency
Google releases Gemma 4 models optimized with Quantization-Aware Training (QAT) to improve efficiency for mobile and laptop deployment, reducing memory footprint to 1GB for the E2B model while preserving quality.
google/gemma-4-12B-it-qat-q4_0-gguf
Google DeepMind releases Gemma 4 models optimized with Quantization-Aware Training (QAT) in multiple formats including GGUF, enabling high quality with reduced memory requirements.
@_philschmid: Weights: https://huggingface.co/collections/google/gemma-4-qat-q4-0… Blog: https://blog.google/innovation-and-ai/techno…
Google released Gemma 4 models with quantization-aware training (QAT) at Q4_0 precision on Hugging Face, offering efficient variants from 5B to 33B parameters.
@TheAhmadOsman: Great news Google just released the QAT (4bit) of their Gemma 4 model series including the 31B Dense and the 26B MoE An…
Google released QAT (4-bit) versions of their Gemma 4 model series, including the 31B Dense and 26B MoE models, furthering open-source AI.
Gemma 4 12b QAT is a regression for my use case, despite all the hype.. Not my main Squeeze
The author reports that the Gemma 4 12b QAT model suffers from a regression in tool calling and coding tasks compared to the standard Q5_K_L version, due to a bug involving control token misconfiguration. Despite high token speed, the model's inconsistent outputs make it unsuitable for agent workflows.