@populartourist: Unsloth Qwen3.6 27B Q6_K doing over 100 t/s with MTP on RTX 5090. That's coming up from 45-50 t/s without MTP. That's i…

X AI KOLs Timeline 05/16/26, 05:53 PM Models

unsloth qwen performance inference rtx-5090 mtp speed

Summary

Unsloth Qwen3.6 27B Q6_K achieves over 100 tokens per second with MTP on RTX 5090, up from 45-50 t/s without MTP.

Unsloth Qwen3.6 27B Q6_K doing over 100 t/s with MTP on RTX 5090. That's coming up from 45-50 t/s without MTP. That's insane. --spec-draft-n-max 3 --spec-draft-p-min 0.75 https://t.co/D0tQkBU7r9

Original Article

View Cached Full Text

Cached at: 05/17/26, 11:32 AM

Unsloth Qwen3.6 27B Q6_K doing over 100 t/s with MTP on RTX 5090.

That’s coming up from 45-50 t/s without MTP. That’s insane.

–spec-draft-n-max 3 –spec-draft-p-min 0.75 https://t.co/D0tQkBU7r9

Similar Articles

Got MTP + TurboQuant running — Qwen3.6-27B -- 80+ t/s at 262K context on a single RTX 4090

Reddit r/LocalLLaMA

Developer achieved 80+ t/s inference on Qwen3.6-27B with 262K context on a single RTX 4090 by combining MTP (Multi-Token Prediction) with TurboQuant's lossless KV cache compression, sharing their implementation fork and technical details.

@Snixtp: https://x.com/Snixtp/status/2055734339346768225

X AI KOLs Timeline

A user benchmarks the MTP variant of Qwen3.6 27B against the normal version on a single RTX 3090 using llama.cpp, finding MTP offers up to 2.37x faster generation at long contexts (32k-64k) but with slower prefill and no concurrency support yet.

125 tok/s for Qwen3.6 q4xl on 2x 4060ti is insane perf/dollar

Reddit r/LocalLLaMA

A user reports achieving 125 tokens per second running Qwen3.6 q4xl on two RTX 4060 Ti GPUs, highlighting excellent performance per dollar and wondering if further optimization can reach 150 tok/s.

80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP

Reddit r/LocalLLaMA

A user shares a configuration for achieving over 80 tokens per second with Qwen3.6 35B A3B on a 12GB VRAM GPU using llama.cpp and Multi-Token Prediction (MTP). The post includes benchmark results and specific command-line parameters to optimize performance.

MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)

Reddit r/LocalLLaMA

Benchmark results for running Qwen 3.6 27B on AMD MI50 GPUs using a custom vllm fork, achieving 52.8 tokens/s TG and 1569 tokens/s PP without quantization or MTP, demonstrating usability for agentic tasks on 2018 hardware.

Similar Articles

Got MTP + TurboQuant running — Qwen3.6-27B -- 80+ t/s at 262K context on a single RTX 4090

@Snixtp: https://x.com/Snixtp/status/2055734339346768225

125 tok/s for Qwen3.6 q4xl on 2x 4060ti is insane perf/dollar

80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP

MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)

Submit Feedback