A100 slow Qwen3.6-27B-FP8

Reddit r/LocalLLaMA 06/21/26, 06:37 PM Models

Summary

The Qwen3.6-27B-FP8 model exhibits slow performance when running on an A100 GPU.

No content available

Original Article

Similar Articles

@ngxson: Qwen3.6-27B running 100% on WebGPU. Not the best speed but still

X AI KOLs Following

A developer demonstrates running the Qwen3.6-27B AI model entirely on WebGPU in a browser, though speed is not optimal.

@cniongolo: I’m not sure people realize yet that you can actually run Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-MTP-GGUF on a dua…

X AI KOLs Following

Demonstrates running a custom Qwen model (Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-MTP-GGUF) on dual Nvidia RTX PRO 6000 Blackwell GPUs at 195 tokens per second using Hugging Face Inference.

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

Reddit r/LocalLLaMA

The author shares a high-performance local inference configuration for running Qwen3.6 35B A3B on limited hardware (8GB VRAM, 32GB RAM) using a modified llama.cpp with TurboQuant support, achieving ~37-51 tok/sec with ~190k context.

Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro

Reddit r/LocalLLaMA

This article describes how to use the SYCL backend with llama.cpp to achieve over 60 tokens per second on the Qwen 3.6-35B-A3B model using an Intel Arc Pro B70 GPU, with the entire model and KV cache in VRAM.

Tried Qwen3.6-27B-UD-Q6_K_XL.gguf with CloudeCode, well I can't believe but it is usable

Reddit r/LocalLLaMA

User reports surprisingly usable coding performance from Qwen3-27B-UD-Q6_K_XL.gguf running locally on RTX 5090 at ~50 tok/s with 200K context, marking a significant leap in local model quality.

Similar Articles

@ngxson: Qwen3.6-27B running 100% on WebGPU. Not the best speed but still

@cniongolo: I’m not sure people realize yet that you can actually run Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-MTP-GGUF on a dua…

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro

Tried Qwen3.6-27B-UD-Q6_K_XL.gguf with CloudeCode, well I can't believe but it is usable

Submit Feedback