Gemma 4 26B-A4B GGUF Benchmarks

Reddit r/LocalLLaMA Models

Summary

Unsloth has released KL Divergence benchmarks for Gemma 4 26B-A4B GGUF quantizations, showing Unsloth GGUFs top 21 of 22 sizes on the Pareto frontier. They also introduced a new UD-IQ4_NL_XL quant fitting in 16GB VRAM and updated Q6_K and MLX quants for both Gemma 4 and Qwen3.6.

Hey r/LocalLLaMA we conducted KL Divergence benchmarks for Gemma 4 26B-A4B GGUFs across providers to help you pick the best quant. * Mean KL Divergence puts nearly all **Unsloth GGUFs on the Pareto frontier** * KLD shows how well a quantized model matches the original BF16 output distribution, indicating retained accuracy. * This makes Unsloth the **top-performing in 21 of 22 sizes.** Similar trend for 99.9% KLD and others. * We also updated our Q6\_K quants to be more dynamic. Previously, they were optimized, just now they're a bit better - no need to re-download though - it's up to you if you want a slightly better version. The previous quant was perfectly fine but this one is slightly bigger. The same was done for Qwen3.6. * We're also introducing a new UD-IQ4\_NL\_XL quant that fits in 16GB VRAM. UD-IQ4\_NL\_XL (14.6GB) sits between UD-IQ4\_XS (13.4GB) and UD-Q4\_K\_S (16.4GB). The same was done for Qwen3.6. For HQ versions of the graphs as Reddit mobile compresses it. See: [Gemma 4 Benchmarks](https://unsloth.ai/docs/models/gemma-4#unsloth-gguf-benchmarks) and [Qwen3.6 Benchmarks](https://unsloth.ai/docs/models/qwen3.6#unsloth-gguf-benchmarks) We also updated our MLX quants to be more dynamic with better layering selection (there are limitations due to MLX): [See here](https://unsloth.ai/docs/models/qwen3.6#mlx-dynamic-quants) |MLX Metrics|**UD-4bit (Old)**|**UD-4bit (New)**|**MLX 4.4bit MSQ**| |:-|:-|:-|:-| |Perplexity|4.772|**4.766**|4.864| |Mean KLD|0.0177|**0.0163**|0.0878| |99.9% KLD|0.8901|**0.8398**|2.9597| |Disk Sze|21.4 GB|21.6 GB|21.2 GB| Gemma 4 GGUFs: [https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF](https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF) Qwen3.6 GGUFs: [https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF](https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF)
Original Article

Similar Articles

unsloth/gemma-4-12B-it-qat-GGUF

Hugging Face Models Trending

Unsloth releases GGUF quantized versions of Google DeepMind's Gemma 4 models, optimized with Quantization-Aware Training (QAT) to reduce memory requirements while preserving quality, supporting multiple formats and sizes for diverse deployment.

unsloth/gemma-4-26B-A4B-it-GGUF

Hugging Face Models Trending

Unsloth releases GGUF-quantized versions of Google DeepMind's Gemma 4 26B A4B instruction-tuned model, enabling efficient local inference with support for tool-calling and fine-tuning via Unsloth Studio. Gemma 4 is a multimodal MoE model with a 256K context window, supporting text, image, video, and audio inputs.

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it

Reddit r/LocalLLaMA

A user compares Qwen3.6 35B-A3B and Gemma 4 26B-A4B-IT running locally on a 16GB VRAM GPU via LM Studio, finding Qwen3.6 produces more detailed outputs while both run at comparable speeds. The post is an informal community comparison using quantized models.