High VRAM local coding model — still Qwen 3.6 27B?

Reddit r/LocalLLaMA 05/12/26, 10:34 PM News

Summary

The user discusses their experience with Qwen 3.6 27B for local coding tasks and asks for recommendations for larger models (100B+) suitable for systems with 224GB of VRAM.

I’ve been using Qwen 3.6 27B and it’s amazing. Not exactly your Opus replacement, but great for small tasks and checking work. But if you had 224GB of VRAM, would it still be your choice? Or is there something you consider better in the 100+B range (GPT-OSS, Deepseek, etc) that’s just not talked about as much because fewer people can run it? I care more about intelligence than t/s.

Original Article

Similar Articles

Qwen 35B-A3B is very usable with 12GB of VRAM

Reddit r/LocalLLaMA

A user benchmarks Qwen 35B-A3B (a 35B MoE model) on a 12GB RTX 3060, finding that 12GB VRAM is a practical sweet spot for running the model with 32k context, achieving ~47 t/s generation.

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

Reddit r/LocalLLaMA

The author shares a high-performance local inference configuration for running Qwen3.6 35B A3B on limited hardware (8GB VRAM, 32GB RAM) using a modified llama.cpp with TurboQuant support, achieving ~37-51 tok/sec with ~190k context.

Qwen 3.6 27B is a BEAST

Reddit r/LocalLLaMA

A developer reports that the new 27B Qwen 3.6 model runs excellently on a 24GB VRAM laptop, passing all PySpark/Python data-transformation benchmarks and eliminating the need for cloud subscriptions.

@DeepTechTR: Qwen 3.6 27B is incredibly fast with 16 GB VRAM! The impact of Pure Quant The era of the 27B model that runs seamlessly…

X AI KOLs Timeline

Qwen 3.6 27B runs fast on 16 GB VRAM thanks to 'Pure Quant' technology, achieving 40 tokens/s with MTP and supporting 64k contexts, enabling local AI on consumer GPUs like RTX 4060 Ti.

Is anyone getting real coding work done with Qwen3.6-35B-A3B-UD-Q4_K_M on a 32GB Mac in opencode, claude code or similar?

Reddit r/LocalLLaMA

A user shares their experience running Qwen3-35B-A3B quantized model on an M2 MacBook Pro with 32GB RAM for coding tasks via opencode and llama.cpp, finding that the 32K context window limit causes critical memory loss during compaction, making complex coding tasks impractical. They conclude that meaningful agentic coding with this model likely requires at least 128K context, exceeding what their hardware can support.

Similar Articles

Qwen 35B-A3B is very usable with 12GB of VRAM

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

Qwen 3.6 27B is a BEAST

@DeepTechTR: Qwen 3.6 27B is incredibly fast with 16 GB VRAM! The impact of Pure Quant The era of the 27B model that runs seamlessly…

Is anyone getting real coding work done with Qwen3.6-35B-A3B-UD-Q4_K_M on a 32GB Mac in opencode, claude code or similar?

Submit Feedback