@TeksEdge: Unsloth released the fastest Qwen3.6-27B MTP GGUF I've tested. Time to upgrade. Compared to the previous GGUF, Q4/Q6 XL…
Summary
Unsloth has released an optimized GGUF version of the Qwen3.6-27B MTP model, achieving significantly faster inference speeds (up to 114 tok/s on an RTX 5090) compared to previous quantizations.
Similar Articles
Qwen3.6 27B Pure Quant: 40 tok/s on 16 GB VRAM
A quantized version of Qwen3.6 27B using a pure Q4_K_M method fits entirely in 16 GB VRAM, achieving up to 40 tok/s token generation speed with MTP, and significantly reducing model size compared to other GGUF variants.
Qwen 3.6 35B GGUF: NTP vs MTP quantization results across GPUs and CPUs
ByteShape releases Qwen 3.6 35B GGUF quantizations in NTP and MTP variants with detailed benchmarking across multiple GPUs and CPUs, finding that larger quants often outperform smaller ones and MTP provides GPU speed boosts at the cost of memory.
unsloth/Qwen3.6-27B-MTP-GGUF
Unsloth has released GGUF weights for the Qwen3.6-27B model, featuring Multi-Token Prediction (MTP) for faster generation and enhanced agentic coding capabilities.
@Italianclownz: Tested MTP, TriAttention, TurboQuant on @UnslothAI @Alibaba_Qwen Qwen 3.6 35B A3B MTP MXFP4_MoE on @huggingface @no_stp…
A user benchmarks MTP, TriAttention, and TurboQuant optimizations on Qwen 3.6 35B using Unsloth on consumer hardware, finding TurboQuant to be the most effective.
UPDATE: Qwen-27B-IQ4_KS and Qwen-27B-IQ_KS_KT for ik_llama.cpp, especially for NVIDIA with 16GB VRAM
New GGUF quantizations of Qwen3.6-27B optimized for 16GB VRAM NVIDIA GPUs, including an experimental Trellis variant, with perplexity benchmarks.