@rumgewieselt: Now its getting crazy ... 3x 1080 Ti (Pascal, 33GB VRAM) Qwen 3.6 27B MTP with 196K TurboQuant ~28-30 t/s consistently
Summary
A user demonstrates successful local inference of a 27B parameter Qwen model across three GTX 1080 Ti GPUs, achieving approximately 28-30 tokens per second using TurboQuant optimization.
View Cached Full Text
Cached at: 05/08/26, 01:32 PM
Now its getting crazy …
3x 1080 Ti (Pascal, 33GB VRAM)
Qwen 3.6 27B MTP with 196K TurboQuant ~28-30 t/s consistently https://t.co/asoDLIrAad
Similar Articles
Got MTP + TurboQuant running — Qwen3.6-27B -- 80+ t/s at 262K context on a single RTX 4090
Developer achieved 80+ t/s inference on Qwen3.6-27B with 262K context on a single RTX 4090 by combining MTP (Multi-Token Prediction) with TurboQuant's lossless KV cache compression, sharing their implementation fork and technical details.
@BlackRainLabs: Using TurboQuant i was able to push 20 tk/s on qwen 3.6 35b MoE on a GTX1060 3GB. Insane for such a small and old card.…
Using TurboQuant, the user achieved 20 tokens per second on a Qwen 3.6 35B MoE model running on a GTX1060 3GB, showcasing impressive performance on outdated hardware.
Wow! Qwen 3.6:35b-a3b on a 3090... pretty amazing.
A user shares impressive results running a quantized Qwen 3.6:35b-a3b model on a used RTX 3090, achieving 160 tokens per second output after fitting the model into VRAM, and demonstrates vision capabilities with a 75-second video processing time.
@DeepTechTR: Qwen 3.6 27B is incredibly fast with 16 GB VRAM! The impact of Pure Quant The era of the 27B model that runs seamlessly…
Qwen 3.6 27B runs fast on 16 GB VRAM thanks to 'Pure Quant' technology, achieving 40 tokens/s with MTP and supporting 64k contexts, enabling local AI on consumer GPUs like RTX 4060 Ti.
2 old RTX 2080 Ti with 22GB vram each Qwen3.6 27B at 38 token/s with f16 kv cache
A user shares their setup using two modded RTX 2080 Ti GPUs with 22GB VRAM each to run Qwen 3.6 27B at 38 tokens/s with llama.cpp, including tips on power limiting, tensor split mode, and KV cache settings.