Unsloth just dropped MTP GGUF weights for Gemma 4!
Summary
Unsloth has released Multi-Token Prediction (MTP) GGUF weights for Gemma 4 models (31B, 26B-A4B, 12B) in Q8, F16, and BF16 precisions, available on Hugging Face.
Similar Articles
Unsloth Gemma 4 QAT MTP assistant models now available
Unsloth released Gemma 4 QAT MTP assistant models as GGUF files on Hugging Face, available in q8_0 and larger quantization formats.
unsloth/gemma-4-12B-it-qat-GGUF
Unsloth releases GGUF quantized versions of Google DeepMind's Gemma 4 models, optimized with Quantization-Aware Training (QAT) to reduce memory requirements while preserving quality, supporting multiple formats and sizes for diverse deployment.
unsloth/gemma-4-26B-A4B-it-GGUF
Unsloth releases GGUF-quantized versions of Google DeepMind's Gemma 4 26B A4B instruction-tuned model, enabling efficient local inference with support for tool-calling and fine-tuning via Unsloth Studio. Gemma 4 is a multimodal MoE model with a 256K context window, supporting text, image, video, and audio inputs.
unsloth/Qwen3.6-27B-MTP-GGUF
Unsloth has released GGUF weights for the Qwen3.6-27B model, featuring Multi-Token Prediction (MTP) for faster generation and enhanced agentic coding capabilities.
google/gemma-4-12B-it-qat-q4_0-gguf
Google DeepMind releases Gemma 4 models optimized with Quantization-Aware Training (QAT) in multiple formats including GGUF, enabling high quality with reduced memory requirements.