RDNA3 Flash Attention fix just dropped by llama.cpp b9158

Reddit r/LocalLLaMA Tools

Summary

llama.cpp b9158 has been released with a fix for Flash Attention on RDNA3 GPUs, improving performance for AMD users.

[https://github.com/ggml-org/llama.cpp/releases](https://github.com/ggml-org/llama.cpp/releases)
Original Article

Similar Articles

ExLlamaV3 Major Updates!

Reddit r/LocalLLaMA

ExLlamaV3 has released a series of major updates including Gemma 4 support, improved caching efficiency, and the new DFlash technology for significantly faster inference speeds across various model categories.