@populartourist: Unsloth Qwen3.6 27B Q6_K doing over 100 t/s with MTP on RTX 5090. That's coming up from 45-50 t/s without MTP. That's i…

X AI KOLs Timeline Models

Summary

Unsloth Qwen3.6 27B Q6_K achieves over 100 tokens per second with MTP on RTX 5090, up from 45-50 t/s without MTP.

Unsloth Qwen3.6 27B Q6_K doing over 100 t/s with MTP on RTX 5090. That's coming up from 45-50 t/s without MTP. That's insane. --spec-draft-n-max 3 --spec-draft-p-min 0.75 https://t.co/D0tQkBU7r9
Original Article
View Cached Full Text

Cached at: 05/17/26, 11:32 AM

Unsloth Qwen3.6 27B Q6_K doing over 100 t/s with MTP on RTX 5090.

That’s coming up from 45-50 t/s without MTP. That’s insane.

–spec-draft-n-max 3 –spec-draft-p-min 0.75 https://t.co/D0tQkBU7r9

Similar Articles

@Snixtp: https://x.com/Snixtp/status/2055734339346768225

X AI KOLs Timeline

A user benchmarks the MTP variant of Qwen3.6 27B against the normal version on a single RTX 3090 using llama.cpp, finding MTP offers up to 2.37x faster generation at long contexts (32k-64k) but with slower prefill and no concurrency support yet.