@nicekate8888: For the past twenty days, I've been obsessing over one thing — how to make Qwen3.6-27B run fast and well on my Mac. I started with Unsloth Q5, got 18 tok/s, and the fan was roaring. Then I switched to MLX 6bit + DFlash, hitting 22 tok/s, still not fast enough. Eventually I found MTPLX 4bit: 43 tok/s with good quality.
Summary
The user shares their experience optimizing Qwen3.6-27B inference speed on a Mac using different quantization methods (Unsloth Q5, MLX 6bit + DFlash, MTPLX 4bit), ultimately reaching 43 tok/s.
Similar Articles
@linexjlin: K2.6 built a Zig LLM inference engine from scratch on Mac in 12h, pushing Qwen 3.5 0.8B from 15 tok/s to 193.1 tok/s
Developer wrote a Zig-based LLM inference engine from zero on macOS in 12 hours, boosting Qwen 3.5 0.8B throughput from 15 to 193 tokens per second.
@nash_su: Mac inference speed doubled. MTPLX is an integrated solution combining MLX and MTP, specifically optimized for model inference on Apple Silicon. By using models with a custom MTP head, it can deliver doubled inference speed. I tested it with Qwen3.6-27…
MTPLX is an integrated solution combining MLX and MTP, specifically optimized for model inference speed on Apple Silicon. Tests show that Qwen3.6-27B achieves double the inference speed of LM Studio, and it also integrates fan management.
Qwen-3.6-27B, llamacpp, speculative decoding - appreciation post
Reddit user demonstrates llamacpp speculative decoding boosting Qwen-3.6-27B token speed from 13.6 to 136.75 t/s, sharing exact commands and hardware setup.
@sanbuphy: K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac, using the niche Zig language to implement and optimize inference, demonstrating the new model’s generalization ability. After 4,000+ tool calls and 12+ hours of continuous operation, K2.6 iterated 14 times…
K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac, using the niche Zig language to implement and optimize inference, demonstrating the new model’s generalization ability. After 4,000+ tool calls and 12+ hours of continuous operation, K2.6 iterated 14 times, boosting throughput from ~15 tokens/s to ~193 tokens/s, ultimately achieving 20% faster inference than LM Studio.
@Youssofal_: Thank you to Kate for the comprehensive review of MTPLX. She’s tested multiple different MLX runtimes and concluded MTP…
nicekate tested multiple MLX runtimes on a Mac for running Qwen3.6-27B and concluded that MTPLX is the fastest, achieving 43 tok/s at 4bit quantization.