qwen3.6-35b-a3b-mtp running on GTX 1060 6GB

Reddit r/LocalLLaMA News

Summary

A user successfully runs the Qwen3.6-35B-a3b-MTP model on a decade-old workstation with a GTX 1060 6GB using LMStudio under Windows, achieving acceptable chat speeds.

I have this old 10-year old Dell T5810 workstation with 32GB ddr3(?) memory and a E5-2698v3 (16 cores 32 threads), a GTX 1060 6GB that's used for mining back in the old days (paid itself back many times over). I managed to get the model running with LMStudio in Windows(!). My settings are: Model: unsloth qwen3.6-35B-a3b-MTP-GGUF UD Q4\_K\_XL Ctx length:131072 GPU offload 41 CPU threadpool size 16 Max concurrent 4 Number of experts 8 Number of MOE layers offloaded to CPU 41 MTP max draft 3 KV quantization both Q4\_0 prefill 16k about 130-150tps decode 4k about 16tps Very usable for chat.
Original Article

Similar Articles

@Snixtp: https://x.com/Snixtp/status/2055734339346768225

X AI KOLs Timeline

A user benchmarks the MTP variant of Qwen3.6 27B against the normal version on a single RTX 3090 using llama.cpp, finding MTP offers up to 2.37x faster generation at long contexts (32k-64k) but with slower prefill and no concurrency support yet.

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Reddit r/LocalLLaMA

The author shares detailed tuning tips for running the Qwen3.6-35B-A3B MoE model on an 8GB RTX 3070 Ti with up to 262k context using llama.cpp, achieving 30+ tps, and notes a 25% speed boost when switching from Windows to Ubuntu Server.