@berryxia: Damn, even my eyes can't keep up with this speed! Daniel Han, founder of UnslothAI, YC S24, previously at NVIDIA doing ML, just released the experimental MTP GGUF of Qwen3.6. The 27B model hits 140 tokens/s on a single GPU. 35B-A...

X AI KOLs Timeline 05/14/26, 02:24 AM Models

qwen unslooth gguf mtp speculative-decoding local-inference speed

Summary

UnslothAI founder Daniel Han released the experimental MTP GGUF version of Qwen3.6, achieving 140 tokens/s for the 27B model and 220 tokens/s for the 35B-A3B version on consumer GPUs — a 1.4x speedup with zero accuracy loss.

Damnnn, even my eyes can't keep up with this speed! Daniel Han, founder of UnslothAI, YC S24, previously at NVIDIA doing ML, just released the experimental MTP GGUF of Qwen3.6. The 27B model runs at 140 tokens/s on a single GPU. The 35B-A3B version is even more insane, hitting 220 tokens/s. That's over 1.4x faster than the original GGUF, with zero accuracy loss. They tested extensively and found that setting draft tokens to 2 is the sweet spot — any higher and the acceptance rate plummets, causing actual speed to drop. Looking at that benchmark curve, my biggest takeaway is that the performance ceiling for local large models has been pushed up significantly. I used to think 30B+ models were too slow to run locally, but now MTP speculative decoding is squeezing every drop of potential out of consumer GPUs. If you're playing with llama.cpp, running local agents, or doing daily coding, you need to try this update immediately. Local AI is starting to feel less and less like a "compromise version."

Original Article

Similar Articles

@seclink: Just hit 134 tok/s with Qwen 3.5-27B Dense and 73 tok/s with the new Qwen 3.6-27B on a single RTX 3090. The 2026 open-source scene is moving at lightspeed…

X AI KOLs Following

A single RTX 3090 pushes 134 tok/s on the fresh 27B Qwen 3.5 Dense and 73 tok/s on Qwen 3.6-27B via fused kernels plus speculative decoding, with GGUF drops the same evening.

@TeksEdge: Unsloth released the fastest Qwen3.6-27B MTP GGUF I've tested. Time to upgrade. Compared to the previous GGUF, Q4/Q6 XL…

X AI KOLs Timeline

Unsloth has released an optimized GGUF version of the Qwen3.6-27B MTP model, achieving significantly faster inference speeds (up to 114 tok/s on an RTX 5090) compared to previous quantizations.

@rohanpaul_ai: Qwen 3.6 27B on a MacBook Pro M5 Max 64GB hitting 34tokens per sec, locally with atomic[.]chat 90% acceptance rate, i.e…

X AI KOLs Following

Qwen 3.6 27B achieves 34 tokens/sec on a MacBook Pro M5 Max 64GB locally with 90% draft acceptance, enabled by TurboQuant, GGUF, and llama.cpp, showcasing a major advancement in laptop-based AI inference.

@rumgewieselt: Now its getting crazy ... 3x 1080 Ti (Pascal, 33GB VRAM) Qwen 3.6 27B MTP with 196K TurboQuant ~28-30 t/s consistently

X AI KOLs Timeline

A user demonstrates successful local inference of a 27B parameter Qwen model across three GTX 1080 Ti GPUs, achieving approximately 28-30 tokens per second using TurboQuant optimization.

@AI_jacksaku: This week’s GitHub dark horse—Unsloth speeds up AI model training 2-5× while cutting VRAM use by 80%. What does that mean? Fine-tuning a large model used to require an A100 cluster and tens of thousands of dollars. Now one RTX 4090 can finish the job in a few hours. How? By optimizing attention compute, eliminating redundant memory copies, and adding QLoRA & Flash Attention support.

X AI KOLs Timeline

Unsloth open-source tool boosts large-model fine-tuning speed 2-5× and slashes VRAM by 80%, letting a single RTX 4090 finish in hours what once needed an A100 cluster.

Similar Articles

@seclink: Just hit 134 tok/s with Qwen 3.5-27B Dense and 73 tok/s with the new Qwen 3.6-27B on a single RTX 3090. The 2026 open-source scene is moving at lightspeed…

@TeksEdge: Unsloth released the fastest Qwen3.6-27B MTP GGUF I've tested. Time to upgrade. Compared to the previous GGUF, Q4/Q6 XL…

@rohanpaul_ai: Qwen 3.6 27B on a MacBook Pro M5 Max 64GB hitting 34tokens per sec, locally with atomic[.]chat 90% acceptance rate, i.e…

@rumgewieselt: Now its getting crazy ... 3x 1080 Ti (Pascal, 33GB VRAM) Qwen 3.6 27B MTP with 196K TurboQuant ~28-30 t/s consistently

Submit Feedback