@ItsmeAjayKV: Update on 3090: Now with Qwen 3.6-35b-a3b moe (q6_k_xl). Crossed 90 t/s for the very first time, no MTP yet, prefill sp…

X AI KOLs Timeline 06/17/26, 06:40 PM News

qwen rtx-3090 llama.cpp benchmarking moe quantization

Summary

A user reports achieving over 90 tokens per second inference speed with Qwen 3.6-35b-a3b MoE model on an RTX 3090 using llama.cpp, with prefill speeds exceeding 1000 t/s, indicating practical local deployment of large language models on consumer hardware.

Update on 3090: Now with Qwen 3.6-35b-a3b moe (q6_k_xl). Crossed 90 t/s for the very first time, no MTP yet, prefill speed improvements is also soo good, crossed 1000t/s. I'm going to also download even bigger Q8 model now, prev it was not an option, now it is, and since i'm ok with reduced speed (40 - 50t/s is fine) for a even better quality quant, i'll take it.

Original Article

View Cached Full Text

Cached at: 06/18/26, 12:05 AM

Update on 3090: Now with Qwen 3.6-35b-a3b moe (q6_k_xl).

Crossed 90 t/s for the very first time, no MTP yet, prefill speed improvements is also soo good, crossed 1000t/s.

I’m going to also download even bigger Q8 model now, prev it was not an option, now it is, and since i’m ok with reduced speed (40 - 50t/s is fine) for a even better quality quant, i’ll take it.

AJ (@ItsmeAjayKV): 🔓Achievement Unlocked: Running Qwen3.6-27b dense🚀🚀

Thanks to the RTX 3090, now I can do this. Running @Alibaba_Qwen Qwen 3.6 27B (Q5_K_XL from @UnslothAI)

quick llama.cpp benchmark results (without MTP):

1,247 tok/s prompt processing (512 token prompt)

35 tok/s

Similar Articles

Wow! Qwen 3.6:35b-a3b on a 3090... pretty amazing.

Reddit r/artificial

A user shares impressive results running a quantized Qwen 3.6:35b-a3b model on a used RTX 3090, achieving 160 tokens per second output after fitting the model into VRAM, and demonstrates vision capabilities with a 75-second video processing time.

@ItsmeAjayKV: Achievement Unlocked: Running Qwen3.6-27b dense Thanks to the RTX 3090, now I can do this. Running @Alibaba_Qwen Qwen 3…

X AI KOLs Timeline

User benchmarks Qwen3.6-27B on an RTX 3090 using llama.cpp, achieving 35 tok/s generation and 1247 tok/s prompt processing.

Qwen 3.5 122B MoE OC on a single 3090 at 35 t/s — full local stack breakdown

Reddit r/openclaw

Detailed breakdown of running Qwen 3.5 122B MoE on a single RTX 3090 at 35 t/s using a custom llama.cpp fork (ik_llama.cpp) with fused MoE operations and expert offloading to CPU RAM, significantly outperforming stock llama.cpp MTP.

@seclink: Just hit 134 tok/s with Qwen 3.5-27B Dense and 73 tok/s with the new Qwen 3.6-27B on a single RTX 3090. The 2026 open-source scene is moving at lightspeed…

X AI KOLs Following

A single RTX 3090 pushes 134 tok/s on the fresh 27B Qwen 3.5 Dense and 73 tok/s on Qwen 3.6-27B via fused kernels plus speculative decoding, with GGUF drops the same evening.

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Reddit r/LocalLLaMA

The author shares detailed tuning tips for running the Qwen3.6-35B-A3B MoE model on an 8GB RTX 3070 Ti with up to 262k context using llama.cpp, achieving 30+ tps, and notes a 25% speed boost when switching from Windows to Ubuntu Server.