A100 slow Qwen3.6-27B-FP8

Reddit r/LocalLLaMA Models

Summary

The Qwen3.6-27B-FP8 model exhibits slow performance when running on an A100 GPU.

No content available
Original Article

Similar Articles

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

Reddit r/LocalLLaMA

The author shares a high-performance local inference configuration for running Qwen3.6 35B A3B on limited hardware (8GB VRAM, 32GB RAM) using a modified llama.cpp with TurboQuant support, achieving ~37-51 tok/sec with ~190k context.