@iotcoi: Qwen3.6-27B-FP8 + Dflash + DDTree, 256k context, 10 agents ~200 tokens/sec max decode 136t/s average on a single tiny G…
Summary
Quantized 27B Qwen3.6 model achieves 200 tok/s peak (136 avg) with 256k context and 10 agents on a single 49W GB10 GPU using Dflash+DDTree optimizations.
View Cached Full Text
Cached at: 04/22/26, 05:51 PM
Qwen3.6-27B-FP8 + Dflash + DDTree, 256k context, 10 agents ~200 tokens/sec max decode 136t/s average on a single tiny GB10 GPU at 49W power
Similar Articles
@iotcoi: Ran Google’s cookbook with 10 agents on my tiny GB10 GPU. 436 tok/s / 43.6 per agent Qwen3.6-35B + Dflash + DDTree on v…
A developer ran 10 concurrent agents of the 35B-parameter Qwen3.6 model on a single 74W GB10 GPU at 436 tok/s total using vLLM, demonstrating high-efficiency edge deployment.
Qwen3.6 27B Pure Quant: 40 tok/s on 16 GB VRAM
A quantized version of Qwen3.6 27B using a pure Q4_K_M method fits entirely in 16 GB VRAM, achieving up to 40 tok/s token generation speed with MTP, and significantly reducing model size compared to other GGUF variants.
Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context
The author shares a high-performance local inference configuration for running Qwen3.6 35B A3B on limited hardware (8GB VRAM, 32GB RAM) using a modified llama.cpp with TurboQuant support, achieving ~37-51 tok/sec with ~190k context.
Qwen 3.6 27B Speculative Decoding Bench: Pushing ~100 TPS on a single RTX 3090
A detailed benchmark comparing speculative decoding engines for Qwen 3.6 27B on a single RTX 3090, showing ik_llama achieving ~100 tokens per second in code generation. Results include decode TPS, TTFT, VRAM usage, and context degradation across 5 engine variants.
80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP
A user shares a configuration for achieving over 80 tokens per second with Qwen3.6 35B A3B on a 12GB VRAM GPU using llama.cpp and Multi-Token Prediction (MTP). The post includes benchmark results and specific command-line parameters to optimize performance.