Cheapest hardware for Qwen 3.6: both 27B and 35B-A3B

Reddit r/LocalLLaMA Tools

Summary

Discusses the cheapest hardware options for running Qwen 3.6 models, comparing RTX 3090 and Tesla V100 GPUs, and provides a detailed cost breakdown for a system at around $2000.

\- "Qwen 3.6/3.5 27b > Qwen 3.6/3.5 35b > Gemma4 31b > Qwen 3.5 9b > Gemma4 12b > Gemma4 26b", people say \- "Qwen 3.6 for coding & Agentic, Gemma4 for human sounding text", people say ​ So I have been eyeing the RTX 3090 24 GB (or sometimes its cheaper Chinese companion RTX 3080 20 GB), and the controversial Tesla v100 32 GB. ​ Target: 40 tok/s for both these Qwen 3.6 ​ It seems the RTX 3090 24 GB might have a brighter future, when (1) the v100 32GB (both the PCIe and SXM2) will soon be discontinued in support, (2) China will soon release Mythos/Fable equivalent in End 2026-Mid 2027. ​ Alibaba asks me $2000 for a Single RTX 3090 system that is upgradable to dual RTX 3090 later. ​ Is there a cheaper way somewhere? ​ \------------------- | Component | Model | Price | |--------------|--------------------------------|-----------| | CPU | Ryzen 5 5600X | $132.25 | | GPU | MSI RTX 3090 VENTUS 3X 24G | $1,088.15 | | Motherboard | ASUS TUF X570-PLUS | $108.81 | | RAM | Kingston FURY Beast 32GB DDR4 | $251.11 | | SSD | Kingston NV3 1TB NVMe | $131.41 | | PSU | Great Wall 1650W 80+ Gold | $130.41 | | Cooler | Valkyrie AQ125 ARGB | $14.90 | | Case | Phanteks PK620 Full Tower | $120.54 | | Fans | ARGB 120mm ×12 | $18.06 | | \*\*TOTAL\*\* | | \*\*$1,995.65\*\* | ​ ​
Original Article

Similar Articles

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

Reddit r/LocalLLaMA

The author shares a high-performance local inference configuration for running Qwen3.6 35B A3B on limited hardware (8GB VRAM, 32GB RAM) using a modified llama.cpp with TurboQuant support, achieving ~37-51 tok/sec with ~190k context.

RTX Pro 4500 Blackwell - Qwen 3.6 27B?

Reddit r/LocalLLaMA

A developer shares local inference benchmarks and systemd configurations for running the Qwen3.6-27B model on an NVIDIA RTX Pro 4500 Blackwell GPU using llama.cpp. The post requests optimization tips for throughput and explores potential use cases for larger models.