amd-gpu

#amd-gpu

RDNA3 Flash Attention fix just dropped by llama.cpp b9158

Reddit r/LocalLLaMA ↗ · 21h ago

llama.cpp b9158 has been released with a fix for Flash Attention on RDNA3 GPUs, improving performance for AMD users.

0 favorites 0 likes

#amd-gpu

If you're using Windows, disable memory compression to stop bottlenecks!

Reddit r/LocalLLaMA ↗ · yesterday

A user shares a fix for performance bottlenecks when running AI models on AMD GPUs in Windows 11 by disabling memory compression via the command 'Disable-mmagent -mc'.

0 favorites 0 likes

#amd-gpu

MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)

Reddit r/LocalLLaMA ↗ · 2d ago

Benchmark results for running Qwen 3.6 27B on AMD MI50 GPUs using a custom vllm fork, achieving 52.8 tokens/s TG and 1569 tokens/s PP without quantization or MTP, demonstrating usability for agentic tasks on 2018 hardware.

0 favorites 0 likes

#amd-gpu

Is using vLLM actually worth it if you aren't serving the model to other people?

Reddit r/LocalLLaMA ↗ · 3d ago

A user discusses the trade-offs between using vLLM and llama.cpp for local, single-user inference on AMD hardware, questioning if vLLM's performance benefits justify the complexity in non-enterprise settings.

0 favorites 0 likes

#amd-gpu

@pupposandro: 2.5x faster than llama.cpp on Strix Halo. We just shipped DFlash + PFlash for the AMD Ryzen AI MAX+ 395 iGPU (gfx1151, …

X AI KOLs Following ↗ · 3d ago

A new toolset (DFlash + PFlash) achieves 2.5x faster inference than llama.cpp on AMD Ryzen AI MAX+ 395 iGPU, demonstrating significant speedups for Qwen3.6-27B with 128 GiB unified memory.

0 favorites 0 likes

#amd-gpu

ROCm Status in mid 2026 [D]

Reddit r/MachineLearning ↗ · 2026-05-07

The author asks about the current viability of AMD's ROCm ecosystem for AI training in mid-2026, comparing it to NVIDIA's CUDA and asking if it has reached a 'just works' stage for PyTorch.

0 favorites 0 likes

#amd-gpu

My 7900XTX is autonomous with qwen 3.6 👀 wow 😍

Reddit r/LocalLLaMA ↗ · 2026-04-20

A user demonstrates Qwen 3.6 running autonomously on an AMD 7900 XTX GPU, locally creating an Android app — described as a sci-fi reality achieved today.

0 favorites 0 likes

amd-gpu

RDNA3 Flash Attention fix just dropped by llama.cpp b9158

If you're using Windows, disable memory compression to stop bottlenecks!

MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)

Is using vLLM actually worth it if you aren't serving the model to other people?

@pupposandro: 2.5x faster than llama.cpp on Strix Halo. We just shipped DFlash + PFlash for the AMD Ryzen AI MAX+ 395 iGPU (gfx1151, …

ROCm Status in mid 2026 [D]

My 7900XTX is autonomous with qwen 3.6 👀 wow 😍

Submit Feedback