@_philschmid: More Gemma 4! New QAT Gemma 4 checkpoints with similar performance while using ~4x less memory! It comes with a new mob…

X AI KOLs Following Models

Summary

New QAT Gemma 4 checkpoints offer similar performance with ~4x less memory, enabling a 1GB memory footprint for Gemma 4 E2B via a new mobile quantization format.

More Gemma 4! New QAT Gemma 4 checkpoints with similar performance while using ~4x less memory! It comes with a new mobile quantization format that reduces memory footprint of Gemma 4 E2B to just 1GB. Quantization-Aware Training (QAT) simulates low-precision operations during training to allow loss-less quantization afterwards for smaller, faster models while maintaining accuracy. Available on @huggingface and directly runnable.
Original Article
View Cached Full Text

Cached at: 06/08/26, 03:22 PM

More Gemma 4! New QAT Gemma 4 checkpoints with similar performance while using ~4x less memory!

It comes with a new mobile quantization format that reduces memory footprint of Gemma 4 E2B to just 1GB.

Quantization-Aware Training (QAT) simulates low-precision operations during training to allow loss-less quantization afterwards for smaller, faster models while maintaining accuracy.

Available on @huggingface and directly runnable.

Similar Articles

Gemma 4 26B A4B IT QAT Comparison

Reddit r/LocalLLaMA

A user benchmarks three quantized versions of Gemma 4 26B IT (4-bit, 6-bit, and 8-bit QAT) on MMLU_PRO and HumanEval, finding that the QAT 8-bit model performs worse than the 6-bit quant on HumanEval and is not clearly better than 4-bit, questioning the superiority of QAT for this model.