Those of you who like Gemma4 models - how are you guys using them?

Reddit r/LocalLLaMA Models

Summary

A developer shares their mixed experience running Gemma4 and Qwen locally for coding tasks, noting issues with tool integration, loop handling, and task completion while asking the community for better usage strategies.

I have been using local LLM for coding quite a lot as well as some other tasks (like data extraction from images) and I had quite a good success with Qwen3.6 models. It's obviously not Sonnet/Opus, but I am able to get quite a lot of work done. Lately I have decided to give Gemma4 a go and it has been... underwhelming I would say. I can run Q5 quant of 31B and Q8 quant of 27B at reasonable speeds (I keep KV cache at FP16 because it seems to matter to them), I have tried a few different GGUF quants (unsloth, some others) and they tend to exhibit the same behavior, I have tried different backends (ROCM and Vulkan) and they also behave the same, so I am reasonably convinced this is just how the model is. The thing I like about them - they seem to know more and have better general ideas. Like, if I want to discuss some approach to writing an app - they are better than Qwen. But unfortunately, that's where the good things end. 1) I am using it from pi harness on Windows and due to many issues with gitbash I just use it with powershell. Sometimes the model tries to do something that doesn't work in powershell and just... gives up. As opposed to Qwen that will retry a couple of times and find a way to do what it wants to do. 2) Gemmas are absolutely terrible at using external tools. To clarify - tools like read file work fine with newer templates, but extra things... Pi harness has concept of skills. Gemma can't seem to comprehend that searxng-search is a skill, not a tool (a different call syntax). It does take sometimes 3-4 prompts to actually convince it to read the skill and try to use it. 3) Gemmas do often get in the loop the moment something complicated/uncertain happens. And unlike Qwen, it's quite hard to get them out of that loop with prompts - they seem to be coming back to it. 4) Gemmas quite often do just stop in the middle of doing something. But people seem to swear by Gemmas. So my question is - what is that you guys are doing with them where it works well for you? What I am missing here? Or are you just using them as a chatbot?
Original Article

Similar Articles

Gemma 4 12B is my new main squeeze

Reddit r/LocalLLaMA

The author shares their experience switching from Qwen 3.6 to Gemma 4 12B (Unsloth Q5_K_XL) for local coding, praising its plug-and-play setup, better syntax accuracy, and manageable VRAM usage despite a slight speed trade-off.

Gemma 4 31B's competence surprised me

Reddit r/LocalLLaMA

A user shares anecdotal findings that Gemma 4 31B outperforms Qwen 3.6 models and matches Opus 4.7 in understanding and refactoring messy academic code, highlighting a benchmark (SciCode) where Gemma excels.

What's your experience with Gemma4 QAT?

Reddit r/LocalLLaMA

User shares positive experience with Gemma4 QAT model, noting quality improvements and speed gains with MTP, and asks others for their experiences.