Unsloth just dropped MTP GGUF weights for Gemma 4!

Reddit r/LocalLLaMA 06/05/26, 03:02 PM Models

unsloth gemma-4 gguf mtp weights open-source huggingface

Summary

Unsloth has released Multi-Token Prediction (MTP) GGUF weights for Gemma 4 models (31B, 26B-A4B, 12B) in Q8, F16, and BF16 precisions, available on Hugging Face.

It appears like Unsloth pushed MTP GGUF weights (Q8, F16, BF16) for 31B, 26B-A4B, 12B. [https://huggingface.co/unsloth/gemma-4-31B-it-GGUF/tree/main/MTP](https://huggingface.co/unsloth/gemma-4-31B-it-GGUF/tree/main/MTP) [https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/tree/main/MTP](https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/tree/main/MTP) [https://huggingface.co/unsloth/gemma-4-12b-it-GGUF/tree/main/MTP](https://huggingface.co/unsloth/gemma-4-12b-it-GGUF/tree/main/MTP)

Original Article

Similar Articles

Unsloth Gemma 4 QAT MTP assistant models now available

Reddit r/LocalLLaMA

Unsloth released Gemma 4 QAT MTP assistant models as GGUF files on Hugging Face, available in q8_0 and larger quantization formats.

unsloth/gemma-4-12B-it-qat-GGUF

Hugging Face Models Trending

Unsloth releases GGUF quantized versions of Google DeepMind's Gemma 4 models, optimized with Quantization-Aware Training (QAT) to reduce memory requirements while preserving quality, supporting multiple formats and sizes for diverse deployment.

unsloth/gemma-4-26B-A4B-it-GGUF

Hugging Face Models Trending

Unsloth releases GGUF-quantized versions of Google DeepMind's Gemma 4 26B A4B instruction-tuned model, enabling efficient local inference with support for tool-calling and fine-tuning via Unsloth Studio. Gemma 4 is a multimodal MoE model with a 256K context window, supporting text, image, video, and audio inputs.

unsloth/Qwen3.6-27B-MTP-GGUF

Hugging Face Models Trending

Unsloth has released GGUF weights for the Qwen3.6-27B model, featuring Multi-Token Prediction (MTP) for faster generation and enhanced agentic coding capabilities.

google/gemma-4-12B-it-qat-q4_0-gguf

Hugging Face Models Trending

Google DeepMind releases Gemma 4 models optimized with Quantization-Aware Training (QAT) in multiple formats including GGUF, enabling high quality with reduced memory requirements.

Similar Articles

Unsloth Gemma 4 QAT MTP assistant models now available

unsloth/gemma-4-12B-it-qat-GGUF

unsloth/gemma-4-26B-A4B-it-GGUF

unsloth/Qwen3.6-27B-MTP-GGUF

google/gemma-4-12B-it-qat-q4_0-gguf

Submit Feedback