@ItsmeAjayKV: Achievement Unlocked: Running Qwen3.6-27b dense Thanks to the RTX 3090, now I can do this. Running @Alibaba_Qwen Qwen 3…

X AI KOLs Timeline 06/17/26, 04:23 PM News

qwen llama-cpp benchmark rtx3090 local-llm open-source

Summary

User benchmarks Qwen3.6-27B on an RTX 3090 using llama.cpp, achieving 35 tok/s generation and 1247 tok/s prompt processing.

Achievement Unlocked: Running Qwen3.6-27b dense Thanks to the RTX 3090, now I can do this. Running @Alibaba_Qwen Qwen 3.6 27B (Q5_K_XL from @UnslothAI) quick llama.cpp benchmark results (without MTP): - 1,247 tok/s prompt processing (512 token prompt) - 35 tok/s generation At ~65K context: - 897 tok/s prompt processing - 34 tok/s generation results are already looking good , qwen 3.6 35b will be flying on this setup, brb.

Original Article

View Cached Full Text

Cached at: 06/17/26, 06:01 PM

Achievement Unlocked: Running Qwen3.6-27b dense

Thanks to the RTX 3090, now I can do this. Running @Alibaba_Qwen Qwen 3.6 27B (Q5_K_XL from @UnslothAI)

quick llama.cpp benchmark results (without MTP):

1,247 tok/s prompt processing (512 token prompt)
35 tok/s generation

At ~65K context:

897 tok/s prompt processing
34 tok/s generation

results are already looking good , qwen 3.6 35b will be flying on this setup, brb.

Similar Articles

@seclink: Just hit 134 tok/s with Qwen 3.5-27B Dense and 73 tok/s with the new Qwen 3.6-27B on a single RTX 3090. The 2026 open-source scene is moving at lightspeed…

X AI KOLs Following

A single RTX 3090 pushes 134 tok/s on the fresh 27B Qwen 3.5 Dense and 73 tok/s on Qwen 3.6-27B via fused kernels plus speculative decoding, with GGUF drops the same evening.

Wow! Qwen 3.6:35b-a3b on a 3090... pretty amazing.

Reddit r/artificial

A user shares impressive results running a quantized Qwen 3.6:35b-a3b model on a used RTX 3090, achieving 160 tokens per second output after fitting the model into VRAM, and demonstrates vision capabilities with a 75-second video processing time.

@cniongolo: I’m not sure people realize yet that you can actually run Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-MTP-GGUF on a dua…

X AI KOLs Following

Demonstrates running a custom Qwen model (Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-MTP-GGUF) on dual Nvidia RTX PRO 6000 Blackwell GPUs at 195 tokens per second using Hugging Face Inference.

@ItsmeAjayKV: Update on 3090: Now with Qwen 3.6-35b-a3b moe (q6_k_xl). Crossed 90 t/s for the very first time, no MTP yet, prefill sp…

X AI KOLs Timeline

A user reports achieving over 90 tokens per second inference speed with Qwen 3.6-35b-a3b MoE model on an RTX 3090 using llama.cpp, with prefill speeds exceeding 1000 t/s, indicating practical local deployment of large language models on consumer hardware.

Qwen 3.5 122B MoE OC on a single 3090 at 35 t/s — full local stack breakdown

Reddit r/openclaw

Detailed breakdown of running Qwen 3.5 122B MoE on a single RTX 3090 at 35 t/s using a custom llama.cpp fork (ik_llama.cpp) with fused MoE operations and expert offloading to CPU RAM, significantly outperforming stock llama.cpp MTP.