Google's quantization aware trained Gemma checkpoints enabling mobile device inference just dropped on HF
Summary
Google released quantization-aware trained Gemma 4 checkpoints on HuggingFace, optimized for mobile device inference and available in QAT Mobile and Q4_0 variants.
Similar Articles
Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency
Google releases Gemma 4 models optimized with Quantization-Aware Training (QAT) to improve efficiency for mobile and laptop deployment, reducing memory footprint to 1GB for the E2B model while preserving quality.
@_philschmid: Weights: https://huggingface.co/collections/google/gemma-4-qat-q4-0… Blog: https://blog.google/innovation-and-ai/techno…
Google released Gemma 4 models with quantization-aware training (QAT) at Q4_0 precision on Hugging Face, offering efficient variants from 5B to 33B parameters.
google/gemma-4-12B-it-qat-q4_0-gguf
Google DeepMind releases Gemma 4 models optimized with Quantization-Aware Training (QAT) in multiple formats including GGUF, enabling high quality with reduced memory requirements.
Welcome Gemma 4: Frontier multimodal intelligence on device
Google DeepMind releases Gemma 4, a frontier multimodal model family available on Hugging Face with Apache 2 licensing, optimized for on-device deployment and supported by various inference libraries.
@_philschmid: More Gemma 4! New QAT Gemma 4 checkpoints with similar performance while using ~4x less memory! It comes with a new mob…
New QAT Gemma 4 checkpoints offer similar performance with ~4x less memory, enabling a 1GB memory footprint for Gemma 4 E2B via a new mobile quantization format.