Tried Qwen3.6-27B-UD-Q6_K_XL.gguf with CloudeCode, well I can't believe but it is usable

Reddit r/LocalLLaMA Models

Summary

User reports surprisingly usable coding performance from Qwen3-27B-UD-Q6_K_XL.gguf running locally on RTX 5090 at ~50 tok/s with 200K context, marking a significant leap in local model quality.

So I tried to run Qwen3-27B-UD-Q6\_K\_XL.gguf with 200K context on my RTX 5090 using llama.cpp. I'm getting around 50 tok/s, which is fine I guess, I don't really know this stuff so it might be improvable. But what I want to say is, I haven't tried local models for coding for quite a long time, and hell, I can't believe we're at the point where it's actually usable? Of course not the same first class experience as Opus 4.7, but damn, we are getting closer and closer. https://preview.redd.it/3pbvuks69twg1.png?width=2556&format=png&auto=webp&s=0ed498974c33bd33d807bf1b91e310c346f1e69c Tried quite a difficult task, not casual CRUD stuff, to see if it can even try to prepare a plan that is somewhat making sense, and it did very well on the first try. Of course that's just a general first impression and I haven't done real day to day coding with it, but at least I like what I see and it looks much more promising than my earlier experience with other models, which could start doing total nonsense at some points.
Original Article

Similar Articles

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

Reddit r/LocalLLaMA

The author shares a high-performance local inference configuration for running Qwen3.6 35B A3B on limited hardware (8GB VRAM, 32GB RAM) using a modified llama.cpp with TurboQuant support, achieving ~37-51 tok/sec with ~190k context.

RTX Pro 4500 Blackwell - Qwen 3.6 27B?

Reddit r/LocalLLaMA

A developer shares local inference benchmarks and systemd configurations for running the Qwen3.6-27B model on an NVIDIA RTX Pro 4500 Blackwell GPU using llama.cpp. The post requests optimization tips for throughput and explores potential use cases for larger models.