Personal benchmark shows Gemma-4E4B tops for routing, Qwen-3.6 27/30B beats Gemma-4 for coding, and MiniMax M2.7 MXFP4 replaces giant Qwen-3.5 quants in an OpenCode llama-swap workflow.
Hi all! I recently made a post about how Gemma 4 managed to replace Qwen 3.5 for me, for semantic routing and a lot of coding stuff and ultimately it was my new daily driver. The next day, Qwen 3.6 released and I've been using it a lot this week. Here's my ultimate comparison: Gemma 4 E4B > Qwen3.5 4B for routing and other classification tasks, I think it might be better at English understanding but might not have super technical smarts like coding Qwen 3.6 30B & 27B > Gemma 4 26B and 31B (both)> Qwen 3.5 30B & 27B Specifically, my light/fast model went through the following changes Qwen 3.5 30B --> Gemma 4 26B -> Qwen 3.6 30B Gemma 4 26B also temporarily replaced my use for Qwen 3.5 27B (dense), until 3.6 came out (now I use them interchangeably) The only Gemma model I use now is E4B for semantic routing. NOW, here's a new breakthrough: I recently downloaded weights to MiniMax M2.7 MXFP4 and used it to replace Qwen 3.5 122B Q8 and Qwen3.5 397B Q2. It's the perfect middle ground and I haven't had any issues. I'm trying to break away from my Claude Code Pro subscription, I normally use Sonnet 4.7 for all of my projects (never bother with Opus as it burns up my usage) and I rarely touch Haiku unless it's a stupid easy task. This morning I installed OpenCode and set up my llama-swap server to swap between Qwen 3.6 30B, and Minimax M2.7 (with the GGML unified memory trick) and it's been AMAZING and I'm going to continue testing further. You do need to handhold it a bit, but it's been giving great results. I haven't set up any agents yet, I've just been manually switching between the models but I've found that Qwen 3.6 30B is great for the planning mode, and have MiniMax M2.7 lay all the groundwork. Then back to Qwen 3.6 30B for edits. I'm using the Q\_8 unsloth quant of Qwen 3.6 30B and I have yet to have it give me any tool/command issues whatsoever through open code. MiniMax M2.7 tried to manually tell me what to do until I gently reminded it that it had the power to do it itself. Whatever tuning happened between 3.5 and 3.6 seemed to really make it do better with tool calling and knowing when to use tools. It's a very good day to code with open source models! 2-3 years ago I remember struggling to replace ChatGPT with CodeLlama 34B, the amount of progress we've made is amazing. Any questions lmk! 2x RTX 3090 + 1 P40 and 128GB of DDR4
A user compares Qwen3.6 35B-A3B and Gemma 4 26B-A4B-IT running locally on a 16GB VRAM GPU via LM Studio, finding Qwen3.6 produces more detailed outputs while both run at comparable speeds. The post is an informal community comparison using quantized models.
A user shares anecdotal findings that Gemma 4 31B outperforms Qwen 3.6 models and matches Opus 4.7 in understanding and refactoring messy academic code, highlighting a benchmark (SciCode) where Gemma excels.
The author shares their experience switching from Qwen 3.6 to Gemma 4 12B (Unsloth Q5_K_XL) for local coding, praising its plug-and-play setup, better syntax accuracy, and manageable VRAM usage despite a slight speed trade-off.
User asks for advice on choosing between quantized Qwen 3.6 35B-A3B at Q4 and Gemma 4 12B at Q8 for local codebase work on a 32GB unified memory setup.