rocm

#rocm

@vllm_project: vLLM v0.22.0 is out! 459 commits from 230 contributors (63 new). Highlights: DeepSeek V4 hardening (NVFP4 fused MoE, fu…

X AI KOLs Timeline ↗ · 2026-05-30 Cached

vLLM v0.22.0 released with 459 commits, featuring DeepSeek V4 hardening, experimental Rust frontend, and batch-invariant Cutlass FP8, reducing end-to-end latency by 28.9%.

0 favorites 0 likes

#rocm

llama.cpp B9387 Significant AMD/ROCm PP Update

Reddit r/LocalLLaMA ↗ · 2026-05-29

llama.cpp version b9387 introduces MFMA support for AMD CDNA architecture (MI100, MI200, MI300 series), improving processing pipeline performance on datacenter AMD GPUs.

0 favorites 0 likes

#rocm

Strix Halo users, a rejected PR can give you up to 30% faster PP for MOEs.

Reddit r/LocalLLaMA ↗ · 2026-05-26

A rejected PR for llama.cpp provides up to 30% faster prompt processing for MOE models on AMD Strix Halo hardware, with gains diminishing at higher context lengths.

0 favorites 0 likes

#rocm

@Italianclownz: Converted Qwen 3.6 35b a3b to ROCmfp4 and this is flying. Used the mtp version bc this ROCmfp4 can also incorporate the…

X AI KOLs Timeline ↗ · 2026-05-24 Cached

Converted the Qwen 3.6 35b a3b model to ROCmfp4 format, leveraging MTP benefits for improved performance on AMD hardware.

0 favorites 0 likes

#rocm

@no_stp_on_snek: got it here if ya want to try it out:

X AI KOLs Following ↗ · 2026-05-23 Cached

A fork of llama.cpp integrating TurboQuant+ for advanced KV-cache and weight quantization, with cross-backend kernel support (Apple Silicon, NVIDIA CUDA, AMD ROCm, Vulkan) and used in production by LocalAI, Chronara, and AtomicChat.

0 favorites 0 likes

#rocm

club-rdna16: practical 16GB AMD/Radeon local LLM testing repo

Reddit r/LocalLLaMA ↗ · 2026-05-23

This repository provides practical testing profiles and benchmarks for running local LLMs on 16GB AMD Radeon GPUs using llama.cpp with ROCm/HIP, focusing on real-world performance metrics like context length and KV cache settings.

0 favorites 0 likes

#rocm

RDNA2 flash attention isn’t enabled stock, I enabled it with this build and doubled my speed

Reddit r/LocalLLaMA ↗ · 2026-05-19

Custom binary workaround enables flash attention on AMD RDNA2 GPUs for llama.cpp, doubling inference speed (70-80 tok/s vs stock crash). Only confirmed working with Qwen3.6 35B/27B.

0 favorites 0 likes

#rocm

Lemonade v10.5.1: an MTP + ROCm 7.13 quick start for Strix Halo

Reddit r/LocalLLaMA ↗ · 2026-05-18

Lemonade v10.5.1 adds MTP support and ROCm 7.13 quick start for Strix Halo, along with a Fedora 43 fix.

0 favorites 0 likes

#rocm

ROCm 7.13 nightly adds strix halo optimizations

Reddit r/LocalLLaMA ↗ · 2026-05-17

AMD's ROCm 7.13 tech preview adds optimizations for Strix Halo (Ryzen AI Max 300) and open-sources the ROCprof Trace Decoder.

0 favorites 0 likes

#rocm

Strix Halo ROCm + MTP Notes (May 2026)

Reddit r/LocalLLaMA ↗ · 2026-05-17

Technical benchmark comparing ROCm and Vulkan backends for LLM inference on Strix Halo hardware after MTP merged into llama.cpp, revealing ROCm suffers severe performance drops at full context while Vulkan remains stable.

0 favorites 0 likes

#rocm

vllm-project/vllm v0.21.1rc0: [ROCm][CI] Stage B gating (#42025)

GitHub Releases Watchlist ↗ · 2026-05-15 Cached

vLLM releases version 0.21.1rc0 with a focus on ROCm CI gating improvements.

0 favorites 0 likes

#rocm

Linux - Why does llama.cpp ROCm consume SO much VRAM for KV cache compared to Vulkan?

Reddit r/LocalLLaMA ↗ · 2026-05-14

A user reports that llama.cpp with ROCm consumes significantly more VRAM for the KV cache than the Vulkan backend, despite identical model and settings, prompting investigation into potential causes.

0 favorites 0 likes

#rocm

Turboquant+MTP for ROCm(Llama CPP)

Reddit r/LocalLLaMA ↗ · 2026-05-14

A developer gets TurboQuant TBQ4 KV cache and Multi-Token Prediction working on AMD ROCm for RDNA3 GPUs in llama.cpp, enabling 64k context on 24 GB VRAM with competitive token rates.

0 favorites 0 likes

#rocm

How to Fine-Tune LLMs on AMD Strix Halo and Other Exotic AMD Hardware

Reddit r/LocalLLaMA ↗ · 2026-05-11

This article provides a tutorial on fine-tuning Large Language Models (LLMs) using AMD Strix Halo hardware, covering both Linux and native Windows environments with SFT and LoRA methods.

0 favorites 0 likes

#rocm

vLLM ROCm has been added to Lemonade as an experimental backend

Reddit r/LocalLLaMA ↗ · 2026-05-08

Lemonade has added an experimental ROCm backend for vLLM, allowing users to easily run safetensors LLMs on AMD GPUs with a simple command.

0 favorites 0 likes

#rocm

ROCm Status in mid 2026 [D]

Reddit r/MachineLearning ↗ · 2026-05-07

The author asks about the current viability of AMD's ROCm ecosystem for AI training in mid-2026, comparing it to NVIDIA's CUDA and asking if it has reached a 'just works' stage for PyTorch.

0 favorites 0 likes

rocm

Submit Feedback