What's your experience with Gemma4 QAT?

Reddit r/LocalLLaMA Models

Summary

User shares positive experience with Gemma4 QAT model, noting quality improvements and speed gains with MTP, and asks others for their experiences.

Hey everyone! Not a native speaker, so please correct my english where I make mistakes, (can only learn from it!). While it's been out only for just a while, I wanted to post about it because it's been such a joy. So, to say upfront: I use Qwen3.6 27B for programming, Gemma4 for basically everything else. So I can't say anything meaningful about programming. Previously I've used Gemma4-31B Q4\_K\_L (for long 128k Q8\_0 context tasks) and Q6\_K\_L (for short 32k Q8\_0 context tasks). For short context tasks, think quick translations, roleplaying, short but accurate OCR, etc. For long context think long-document parsing, websearch research, etc. With the QAT model, I've been able to use the same model for both tasks (nice!) and notice subtle quality improvements. With roleplay for example, it has much more varied word use, more context relevant remarks, understand corrolations better and able to use it, etc. Sadly I have no experience with the Q8\_0 model, but from what I can tell it performs at least better than Q6\_K\_L from bartowski. It is however still severely hampered by cache quant, Q8\_0 does show a noticable degration for me at 128K. Using MTP with Gemma 31B QAT has been amazing too! I get 50 t/s tg (opposed to 21 t/s) for 32k tokens wikipedia page summerization, \~36 t/s tg during roleplay (opposed to 20 t/s), and you likely can get higher numbers on linux (stuck with windows for now...). I had to dial it in though, 5 max drafts seemed to work well for me, but for my friends 4 or 6 worked better for them. Try 3-7 in 5 separate runs for the same task and see wich one runs best for you. So yeah, enough about my experiences! How was yours? Do you notice any improvement or degration when using the QAT models? And what is programming like on it?
Original Article

Similar Articles

Gemma 4 26B A4B IT QAT Comparison

Reddit r/LocalLLaMA

A user benchmarks three quantized versions of Gemma 4 26B IT (4-bit, 6-bit, and 8-bit QAT) on MMLU_PRO and HumanEval, finding that the QAT 8-bit model performs worse than the 6-bit quant on HumanEval and is not clearly better than 4-bit, questioning the superiority of QAT for this model.