Gemma 4 12B is my new main squeeze

Reddit r/LocalLLaMA 06/05/26, 06:57 AM Models

gemma-4 12b local-coding llama.cpp quantization model-comparison

Summary

The author shares their experience switching from Qwen 3.6 to Gemma 4 12B (Unsloth Q5_K_XL) for local coding, praising its plug-and-play setup, better syntax accuracy, and manageable VRAM usage despite a slight speed trade-off.

The Unsloth Q5\_K\_XL is officially my main squeeze for local coding. I started out with the Q4\_K\_XL, but found myself fixing syntax errors a little too often. It wasn't terrible, but I had one file where I had to make 23 edits just for syntax. With the Q4 I was pulling around 61 t/s, and moving to the Q5 dropped me down to 50 t/s, but now most things get one-shotted (not zero-shot, I still had to tell this baby what to build \*wink\*, looking at you grammar/tech Nazis). The model file sits right around 8.6GB. I ended up capping the context window at 32k with a Q8 KV cache in llama.cpp to keep things snappy. When all is said and done, it about 15.7 GB of vram with a gig spilling over on the cached checkpoints. Honestly, 32k is plenty for my workflow. It's more than enough room to focus on the exact tasks I need to get done. Before anyone asks if this is better than Qwen 3.6 27B (which I could never run anyway) or the 35B A3B... for me, the answer is yes, for a couple of reasons: * **Tool call headaches:** I had to configure Qwen's tool calls from XML to JSON. It just made things inconsistent and required way too much messing around with the chat template, llama.cpp settings, and memory management. * **Gemma 4 is plug-and-play:** I just set the cache, locked in the context length, attached it to my PI harness, and I was already rolling. I am able to write code, short stories, and HTML games. I still need to test it with Godot, but it works great for Lua since I do Cyberpunk 2077 mods as a hobby. I am sorry, Qwen, that we had to break up. Please understand it's not you, it's me. XOXO

Original Article

Gemma 4 12B is my new main squeeze

Similar Articles

Gemma 4 beats Qwen 3.5 (UPDATE), and Qwen 3.6 27B + MiniMax M2.7 is the best OpenCode setup

Those of you who like Gemma4 models - how are you guys using them?

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it

gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint

Qwen 3.6 27B kick balls

Submit Feedback

Similar Articles

Gemma 4 beats Qwen 3.5 (UPDATE), and Qwen 3.6 27B + MiniMax M2.7 is the best OpenCode setup

Those of you who like Gemma4 models - how are you guys using them?

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it

gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint
Qwen3.5-9B outperforms gemma-4-12b-it on 5 of 8 benchmarks despite having a smaller footprint, with gemma only slightly better at coding.