Gemma 4 QAT 31B responds better to KV cache quantization too

Reddit r/LocalLLaMA Models

Summary

The Gemma 4 QAT 31B model demonstrates improved behavior with KV cache quantization, suggesting enhanced inference efficiency.

No content available
Original Article

Similar Articles

Gemma 4 26B A4B IT QAT Comparison

Reddit r/LocalLLaMA

A user benchmarks three quantized versions of Gemma 4 26B IT (4-bit, 6-bit, and 8-bit QAT) on MMLU_PRO and HumanEval, finding that the QAT 8-bit model performs worse than the 6-bit quant on HumanEval and is not clearly better than 4-bit, questioning the superiority of QAT for this model.