Gemma 4 26B-A4B GGUF Benchmarks
Summary
Unsloth has released KL Divergence benchmarks for Gemma 4 26B-A4B GGUF quantizations, showing Unsloth GGUFs top 21 of 22 sizes on the Pareto frontier. They also introduced a new UD-IQ4_NL_XL quant fitting in 16GB VRAM and updated Q6_K and MLX quants for both Gemma 4 and Qwen3.6.
Similar Articles
unsloth/gemma-4-12B-it-qat-GGUF
Unsloth releases GGUF quantized versions of Google DeepMind's Gemma 4 models, optimized with Quantization-Aware Training (QAT) to reduce memory requirements while preserving quality, supporting multiple formats and sizes for diverse deployment.
unsloth/gemma-4-26B-A4B-it-GGUF
Unsloth releases GGUF-quantized versions of Google DeepMind's Gemma 4 26B A4B instruction-tuned model, enabling efficient local inference with support for tool-calling and fine-tuning via Unsloth Studio. Gemma 4 is a multimodal MoE model with a 256K context window, supporting text, image, video, and audio inputs.
Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it
A user compares Qwen3.6 35B-A3B and Gemma 4 26B-A4B-IT running locally on a 16GB VRAM GPU via LM Studio, finding Qwen3.6 produces more detailed outputs while both run at comparable speeds. The post is an informal community comparison using quantized models.
gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint
Qwen3.5-9B outperforms gemma-4-12b-it on 5 of 8 benchmarks despite having a smaller footprint, with gemma only slightly better at coding.
Personal Eval follow-up: Gemma4 26B MoE (Q8) vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared
Personal benchmark shows Qwen3.5-27B Dense and Gemma4-31B Dense fix 100 % of 37 test failures, outperforming Gemma4-26B MoE even at 8-bit quantization, while using fewer tokens and less wall-clock time.