RDNA3 Flash Attention fix just dropped by llama.cpp b9158

Reddit r/LocalLLaMA Tools

Summary

llama.cpp b9158 has been released with a fix for Flash Attention on RDNA3 GPUs, improving performance for AMD users.

[https://github.com/ggml-org/llama.cpp/releases](https://github.com/ggml-org/llama.cpp/releases)
Original Article

Similar Articles

llama.cpp B9387 Significant AMD/ROCm PP Update

Reddit r/LocalLLaMA

llama.cpp version b9387 introduces MFMA support for AMD CDNA architecture (MI100, MI200, MI300 series), improving processing pipeline performance on datacenter AMD GPUs.