@zhixianio: Finished testing, feeling quite surprised, not sure if I'm using it wrong. Feel free to provide counterexamples. Here are my results: On M5 Max, pitting this community fine-tuned gemma-4-12B-coder (llama.cpp) against my daily driver Qwen3.6-35B-…
Summary
The user tested the community fine-tuned gemma-4-12B-coder against Qwen3.6-35B-A3B MoE on three programming tasks, finding that gemma performed poorly on complex stateful programs, while Qwen 35B remained robust.
View Cached Full Text
Cached at: 06/15/26, 03:07 PM
Finished testing, and I’m honestly quite surprised by the results. Not sure if I’m using it wrong — feel free to offer counterexamples. Here’s what I found:
On the M5 Max, I pitted this community fine-tuned gemma-4-12B-coder (llama.cpp) against my daily driver Qwen3.6-35B-A3B MoE (oMLX). Three tasks:
- Matplotlib data chart: Tie — both ran correctly on the first try, properly formatted.
- Three.js galaxy particle effect: Qwen produced a rotatable and zoomable galaxy; gemma gave a black screen (missing importmap, wrong CDN version number by one digit, particle size too small to see — three bugs stacked together).
- A fully playable Tetris: Qwen actually works (block drop, line clear, score up to 117, Next preview all present); gemma’s blocks don’t drop at all, score stays at 0.
- Video: Left = Tetris comparison, Right = Galaxy effect comparison
Later, I also downloaded the original gemma-4-12B-it (also 4-bit quantized) and ran the same Tetris → still broken (empty board, score NaN, line count jumping randomly). This shows the bottleneck is that the 12B parameter size can’t handle complex programs that require “long context, stateful, single-shot generation”, unrelated to fine-tuning.
Also found an interesting point: when enabling thinking mode on the original version, it used up 12,000 tokens just “thinking” without outputting a single line of code; the coder fine-tuned version, on the other hand, learned to “think a little then act”. Fine-tuning improves convergence/efficiency, but doesn’t raise the ceiling of a 12B model.
My Qwen 35B still sits firmly on the sweet spot throne.
Hugging Models (@HuggingModels):
Gemma 4 12B Coder is here and it’s a game changer for local code generation. This GGUF model packs Google’s latest gemma-4 architecture into a compact 12B size, perfect for running on consumer hardware. It’s optimized for reasoning and thinking, making it ideal for developers who
Similar Articles
Gemma 4 beats Qwen 3.5 (UPDATE), and Qwen 3.6 27B + MiniMax M2.7 is the best OpenCode setup
Personal benchmark shows Gemma-4E4B tops for routing, Qwen-3.6 27/30B beats Gemma-4 for coding, and MiniMax M2.7 MXFP4 replaces giant Qwen-3.5 quants in an OpenCode llama-swap workflow.
Qwen3.6:27b single-shot fixed a CSS UI bug that had Gemma4:26B doom looping uselessly for 15 minutes
A user shares a detailed comparison of local coding performance, noting that Qwen3.6-27B fixed a CSS bug in a single shot while Gemma4-26B entered a recursive error loop. The post highlights trade-offs between dense and MoE models on Apple Silicon hardware.
gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint
Qwen3.5-9B outperforms gemma-4-12b-it on 5 of 8 benchmarks despite having a smaller footprint, with gemma only slightly better at coding.
Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it
A user compares Qwen3.6 35B-A3B and Gemma 4 26B-A4B-IT running locally on a 16GB VRAM GPU via LM Studio, finding Qwen3.6 produces more detailed outputs while both run at comparable speeds. The post is an informal community comparison using quantized models.
I benchmarked 21 local LLMs on a MacBook Air M5 for code quality AND speed
A developer benchmarked 21 local LLMs on MacBook Air M5 using HumanEval+ and found Qwen 3.6 35B-A3B (MoE) leads at 89.6% with 16.9 tok/s, while Qwen 2.5 Coder 7B offers the best RAM-to-performance ratio at 84.2% in 4.5 GB. Notably, Gemma 4 models significantly underperformed expectations (31.1% for 31B), possibly due to Q4_K_M quantization effects.