@andrewchen: finding the main downside with experimenting with local AI models is that you end up buying one GPU, then another, then…
Summary
Andrew Chen shares his experience of buying multiple GPUs for local AI experimentation, running Qwen3.6 27B dense at 100 tok/s on a 5090 eGPU, and compares it to Sonnet 4.6.
View Cached Full Text
Cached at: 05/19/26, 02:42 AM
finding the main downside with experimenting with local AI models is that you end up buying one GPU, then another, then another, then another…
But I’m running qwen3.6 27b dense at 100 tok/s now on a 5090 eGPU! It feels like sonnet 4.6? Fast and highly usable
I figure the GPUs I have will now increase in value over the next few years so it’ll all be worth it
Similar Articles
@leopardracer: https://x.com/leopardracer/status/2055341758523883631
A user shares their experience setting up a dual-GPU local AI lab with RTX 4080 Super and 5060 Ti, running Qwen 3.6 models via llama.cpp and llama-swap to reduce API costs and enable unrestricted experimentation.
@davis7: @0xSero helped me setup local models properly and I uh, had no idea these things had gotten this good Are they frontier…
The author highlights the impressive capabilities of the open-source Qwen 3.6-27B model running locally on an RTX 5090, noting its strong performance on programming tasks and comparing it favorably to commercial models, despite the complexity of local deployment.
we really all are going to make it, aren't we? 2x3090 setup.
A user shares their experience setting up a dual 3090 GPU system to run the Qwen 3.6 27b model locally, achieving over 100 tokens/second after switching to Ubuntu and using the club-3090 tool with custom patches. They express excitement about the future of local AI.
@gippp69: THIS GUY SAW A $430 AI BILL AND BUILT HIS OWN AI LAB UNDER HIS DESK INSTEAD RTX 5090 + RTX 4090, 56GB VRAM, 128GB RAM, …
A user built a private AI lab under his desk using RTX 5090 and RTX 4090 GPUs, running local open-source models like Qwen, DeepSeek, and Llama to avoid API costs.
@rumgewieselt: Now its getting crazy ... 3x 1080 Ti (Pascal, 33GB VRAM) Qwen 3.6 27B MTP with 196K TurboQuant ~28-30 t/s consistently
A user demonstrates successful local inference of a 27B parameter Qwen model across three GTX 1080 Ti GPUs, achieving approximately 28-30 tokens per second using TurboQuant optimization.