rdna2

#rdna2

RDNA2 flash attention isn’t enabled stock, I enabled it with this build and doubled my speed

Reddit r/LocalLLaMA ↗ · 2026-05-19

Custom binary workaround enables flash attention on AMD RDNA2 GPUs for llama.cpp, doubling inference speed (70-80 tok/s vs stock crash). Only confirmed working with Qwen3.6 35B/27B.

0 favorites 0 likes

rdna2

RDNA2 flash attention isn’t enabled stock, I enabled it with this build and doubled my speed

Submit Feedback