@ngxson: Qwen3.6-27B running 100% on WebGPU. Not the best speed but still
Summary
A developer demonstrates running the Qwen3.6-27B AI model entirely on WebGPU in a browser, though speed is not optimal.
View Cached Full Text
Cached at: 05/18/26, 02:33 PM
Qwen3.6-27B running 100% on WebGPU. Not the best speed but still 😁 https://t.co/Z1dpMkzykr
Similar Articles
"Browser OS" implemented by Qwen 3.6 35B: The best result I ever got from a local model
A user reports achieving impressive results with Qwen 3.6 35B running a 'Browser OS' implementation locally, highlighting the model's capability for complex task execution without cloud dependencies.
Tried Qwen3.6-27B-UD-Q6_K_XL.gguf with CloudeCode, well I can't believe but it is usable
User reports surprisingly usable coding performance from Qwen3-27B-UD-Q6_K_XL.gguf running locally on RTX 5090 at ~50 tok/s with 200K context, marking a significant leap in local model quality.
@DeepTechTR: Qwen 3.6 27B is incredibly fast with 16 GB VRAM! The impact of Pure Quant The era of the 27B model that runs seamlessly…
Qwen 3.6 27B runs fast on 16 GB VRAM thanks to 'Pure Quant' technology, achieving 40 tokens/s with MTP and supporting 64k contexts, enabling local AI on consumer GPUs like RTX 4060 Ti.
Running Qwen3.6-35B-A3B on a laptop RTX 4060 (8GB) — what worked, what didn't, and a surprising speculative-decoding result
A detailed account of running the Qwen3.6-35B-A3B MoE model on an 8GB laptop GPU, covering effective optimizations like --no-mmap and VRAM headroom, unexpected findings where speculative decoding improved speed by 26% contrary to benchmarks, and pitfalls with Windows and CPU bottlenecks.
Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context
The author shares a high-performance local inference configuration for running Qwen3.6 35B A3B on limited hardware (8GB VRAM, 32GB RAM) using a modified llama.cpp with TurboQuant support, achieving ~37-51 tok/sec with ~190k context.