Those of you who like Gemma4 models - how are you guys using them?

Reddit r/LocalLLaMA 05/09/26, 01:49 AM Models

gemma local-llm open-source tool-use inference community-feedback

Summary

A developer shares their mixed experience running Gemma4 and Qwen locally for coding tasks, noting issues with tool integration, loop handling, and task completion while asking the community for better usage strategies.

I have been using local LLM for coding quite a lot as well as some other tasks (like data extraction from images) and I had quite a good success with Qwen3.6 models. It's obviously not Sonnet/Opus, but I am able to get quite a lot of work done. Lately I have decided to give Gemma4 a go and it has been... underwhelming I would say. I can run Q5 quant of 31B and Q8 quant of 27B at reasonable speeds (I keep KV cache at FP16 because it seems to matter to them), I have tried a few different GGUF quants (unsloth, some others) and they tend to exhibit the same behavior, I have tried different backends (ROCM and Vulkan) and they also behave the same, so I am reasonably convinced this is just how the model is. The thing I like about them - they seem to know more and have better general ideas. Like, if I want to discuss some approach to writing an app - they are better than Qwen. But unfortunately, that's where the good things end. 1) I am using it from pi harness on Windows and due to many issues with gitbash I just use it with powershell. Sometimes the model tries to do something that doesn't work in powershell and just... gives up. As opposed to Qwen that will retry a couple of times and find a way to do what it wants to do. 2) Gemmas are absolutely terrible at using external tools. To clarify - tools like read file work fine with newer templates, but extra things... Pi harness has concept of skills. Gemma can't seem to comprehend that searxng-search is a skill, not a tool (a different call syntax). It does take sometimes 3-4 prompts to actually convince it to read the skill and try to use it. 3) Gemmas do often get in the loop the moment something complicated/uncertain happens. And unlike Qwen, it's quite hard to get them out of that loop with prompts - they seem to be coming back to it. 4) Gemmas quite often do just stop in the middle of doing something. But people seem to swear by Gemmas. So my question is - what is that you guys are doing with them where it works well for you? What I am missing here? Or are you just using them as a chatbot?

Original Article

Those of you who like Gemma4 models - how are you guys using them?

Similar Articles

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it

@svpino: Hermes with Gemma 4 or Qwen 3.5 is literally the best combo you can run locally on your computer. You've got to give th…

Gemma 4 beats Qwen 3.5 (UPDATE), and Qwen 3.6 27B + MiniMax M2.7 is the best OpenCode setup

Trials and tribulations fine-tuning & deploying Gemma-4 [P]

I tested Qwen3.6-27B, Qwen3.6-35B-A3B, Qwen3.5-27B and Gemma 4 on the same real architecture-writing task on an RTX 5090

Submit Feedback

Similar Articles

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it

@svpino: Hermes with Gemma 4 or Qwen 3.5 is literally the best combo you can run locally on your computer. You've got to give th…

Gemma 4 beats Qwen 3.5 (UPDATE), and Qwen 3.6 27B + MiniMax M2.7 is the best OpenCode setup

Trials and tribulations fine-tuning & deploying Gemma-4 [P]

I tested Qwen3.6-27B, Qwen3.6-35B-A3B, Qwen3.5-27B and Gemma 4 on the same real architecture-writing task on an RTX 5090