@_philschmid: More Gemma 4! New QAT Gemma 4 checkpoints with similar performance while using ~4x less memory! It comes with a new mob…
Summary
New QAT Gemma 4 checkpoints offer similar performance with ~4x less memory, enabling a 1GB memory footprint for Gemma 4 E2B via a new mobile quantization format.
View Cached Full Text
Cached at: 06/08/26, 03:22 PM
More Gemma 4! New QAT Gemma 4 checkpoints with similar performance while using ~4x less memory!
It comes with a new mobile quantization format that reduces memory footprint of Gemma 4 E2B to just 1GB.
Quantization-Aware Training (QAT) simulates low-precision operations during training to allow loss-less quantization afterwards for smaller, faster models while maintaining accuracy.
Available on @huggingface and directly runnable.
Similar Articles
Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency
Google releases Gemma 4 models optimized with Quantization-Aware Training (QAT) to improve efficiency for mobile and laptop deployment, reducing memory footprint to 1GB for the E2B model while preserving quality.
Gemma 4 QAT benchmark results (AMD 7900 XTX): faster, less VRAM, no quality loss
A user benchmarks Google's Gemma 4 QAT models on an AMD 7900 XTX, reporting up to 45% faster generation, 83% higher throughput, and significant VRAM savings (e.g., 5.7GB for the 12B QAT model) with no quality loss compared to standard weights.
Google's quantization aware trained Gemma checkpoints enabling mobile device inference just dropped on HF
Google released quantization-aware trained Gemma 4 checkpoints on HuggingFace, optimized for mobile device inference and available in QAT Mobile and Q4_0 variants.
Gemma 4 26B A4B IT QAT Comparison
A user benchmarks three quantized versions of Gemma 4 26B IT (4-bit, 6-bit, and 8-bit QAT) on MMLU_PRO and HumanEval, finding that the QAT 8-bit model performs worse than the 6-bit quant on HumanEval and is not clearly better than 4-bit, questioning the superiority of QAT for this model.
Personal Eval follow-up: Gemma4 26B MoE (Q8) vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared
Personal benchmark shows Qwen3.5-27B Dense and Gemma4-31B Dense fix 100 % of 37 test failures, outperforming Gemma4-26B MoE even at 8-bit quantization, while using fewer tokens and less wall-clock time.