qwen3.6-35b-a3b-mtp running on GTX 1060 6GB

Reddit r/LocalLLaMA 05/24/26, 07:10 PM News

model-inference gguf lm-studio qwen hardware low-spec deployment

Summary

A user successfully runs the Qwen3.6-35B-a3b-MTP model on a decade-old workstation with a GTX 1060 6GB using LMStudio under Windows, achieving acceptable chat speeds.

I have this old 10-year old Dell T5810 workstation with 32GB ddr3(?) memory and a E5-2698v3 (16 cores 32 threads), a GTX 1060 6GB that's used for mining back in the old days (paid itself back many times over). I managed to get the model running with LMStudio in Windows(!). My settings are: Model: unsloth qwen3.6-35B-a3b-MTP-GGUF UD Q4\_K\_XL Ctx length:131072 GPU offload 41 CPU threadpool size 16 Max concurrent 4 Number of experts 8 Number of MOE layers offloaded to CPU 41 MTP max draft 3 KV quantization both Q4\_0 prefill 16k about 130-150tps decode 4k about 16tps Very usable for chat.

Original Article

Similar Articles

@Snixtp: https://x.com/Snixtp/status/2055734339346768225

X AI KOLs Timeline

A user benchmarks the MTP variant of Qwen3.6 27B against the normal version on a single RTX 3090 using llama.cpp, finding MTP offers up to 2.37x faster generation at long contexts (32k-64k) but with slower prefill and no concurrency support yet.

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Reddit r/LocalLLaMA

The author shares detailed tuning tips for running the Qwen3.6-35B-A3B MoE model on an 8GB RTX 3070 Ti with up to 262k context using llama.cpp, achieving 30+ tps, and notes a 25% speed boost when switching from Windows to Ubuntu Server.

Running Qwen3.6-35B-A3B on a laptop RTX 4060 (8GB) — what worked, what didn't, and a surprising speculative-decoding result

Reddit r/LocalLLaMA

A detailed account of running the Qwen3.6-35B-A3B MoE model on an 8GB laptop GPU, covering effective optimizations like --no-mmap and VRAM headroom, unexpected findings where speculative decoding improved speed by 26% contrary to benchmarks, and pitfalls with Windows and CPU bottlenecks.

@tunguz: After seeing these tweets, I decided to try it out on my own old Ubuntu computer with RTX 1070 GPU (the one that I just…

X AI KOLs Following

A user reports successfully running Qwen3 8B locally on an older RTX 1070 GPU, demonstrating that modern LLMs can run on decade-old hardware with decent performance.

@cniongolo: I’m not sure people realize yet that you can actually run Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-MTP-GGUF on a dua…