Qwen 3.6 is actually useful for vibe-coding, and way cheaper than Claude

Reddit r/LocalLLaMA Models

Summary

User demonstrates Qwen 3.6 27B/35B running locally with llama-server cuts Claude Code API costs from $142 to <$4 for 8-hour vibe-coding session, achieving 30-day payback on $4500 dual-RTX 3090 rig.

Launched claude code, pointed it at my running Qwen, and, well, it vibe codes perfectly fine. I started a project with Qwen3.6-35B-A3B (Q4) yesterday, and then this morning switched to 27B (Q8), and both worked fine! Running on a dual 3090 rig with 200k context. Running Unsloth Q\_8. No fancy setup, just followed unsloths quickstart guide and set the context higher. \`\`\` \#!/bin/bash llama-server \\ \-hf unsloth/Qwen3.6-27B-GGUF:Q8\_0 \\ \--alias "unsloth/Qwen3.6-27B" \\ \--temp 0.6 \\ \--top-p 0.95 \\ \--top-k 20 \\ \--min-p 0.00 \\ \--ctx-size 200000 \\ \--port 8001 \\ \--host [0.0.0.0](http://0.0.0.0) \`\`\` \`\`\` \#!/bin/bash export ANTHROPIC\_AUTH\_TOKEN="ollama" export ANTHROPIC\_API\_KEY="" export ANTHROPIC\_BASE\_URL="[http://192.168.18.4:8001](http://192.168.18.4:8001)" claude $@ \`\`\` The best part is seeing Claude Code's cost estimate. Over that 8 hours I would have racked up $142 in API calls, and instead if cost me <$4 in electricity (assuming my rig pulled 1kw the entire time, in reality it's less, but I don't have my power meter hooked up currently). So to all the naysayers about "local isn't worth it", this rig cost me \~$4500 to build (NZD), and thus has a payback period of \~260 hours of using it instead of Anthropic's API's. If I use it full time as my day job, that's \~30 days. If I run a dark-software factory 24/7, that's 10 days.Kicking off projects in the evening every now and then, that's a payback period of, what, maybe a couple months? What did I vibe code? Nothing too fancy. A server in rust that monitors my server's resources, and exposes it to a web dashboard with SSE. Full stack development, end to end, all done with a local model. I interacted with it maybe 5 times. Once to prompt it, and the other 4 for UI/UX changes/bug reports. I'm probably not going to cancel my codex subscription quite yet (I couldn't get codex working with llama-server?), but it may not be long
Original Article

Similar Articles

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

Reddit r/LocalLLaMA

The author shares a high-performance local inference configuration for running Qwen3.6 35B A3B on limited hardware (8GB VRAM, 32GB RAM) using a modified llama.cpp with TurboQuant support, achieving ~37-51 tok/sec with ~190k context.