rdna3

#rdna3

RDNA3 Flash Attention fix just dropped by llama.cpp b9158

Reddit r/LocalLLaMA ↗ · 23h ago

llama.cpp b9158 has been released with a fix for Flash Attention on RDNA3 GPUs, improving performance for AMD users.

0 favorites 0 likes

#rdna3

Turboquant+MTP for ROCm(Llama CPP)

Reddit r/LocalLLaMA ↗ · yesterday

A developer gets TurboQuant TBQ4 KV cache and Multi-Token Prediction working on AMD ROCm for RDNA3 GPUs in llama.cpp, enabling 64k context on 24 GB VRAM with competitive token rates.

0 favorites 0 likes

rdna3

RDNA3 Flash Attention fix just dropped by llama.cpp b9158

Turboquant+MTP for ROCm(Llama CPP)

Submit Feedback