vLLM ROCm has been added to Lemonade as an experimental backend

Reddit r/LocalLLaMA Tools

Summary

Lemonade has added an experimental ROCm backend for vLLM, allowing users to easily run safetensors LLMs on AMD GPUs with a simple command.

vLLM has the ability to run .safetensors LLMs before they are converted to GGUF and represents a new engine to explore. I personally had never tried it out until u/krishna2910-amd/ u/mikkoph and u/sa1sr1 made it as easy as running llama.cpp in Lemonade: ``` lemonade backends install vllm:rocm lemonade run Qwen3.5-0.8B-vLLM ``` This is an experimental backend for us in the sense that the essentials are implemented, but there are known rough edges. We want the community's feedback to see where and how far we should take this. If you find it interesting, please let us know your thoughts! Quick start guide: https://lemonade-server.ai/news/vllm-rocm.html GitHub: https://github.com/lemonade-sdk/lemonade Discord: https://discord.gg/5xXzkMu8Zk
Original Article

Similar Articles

ROCm vs Vulkan vs vLLM on Dual R9700's

Reddit r/LocalLLaMA

A comparison of AI inference frameworks ROCm, Vulkan, and vLLM running on dual AMD Radeon 9700 GPUs, likely benchmarking performance for large language models.

Turboquant+MTP for ROCm(Llama CPP)

Reddit r/LocalLLaMA

A developer gets TurboQuant TBQ4 KV cache and Multi-Token Prediction working on AMD ROCm for RDNA3 GPUs in llama.cpp, enabling 64k context on 24 GB VRAM with competitive token rates.

club-rdna16: practical 16GB AMD/Radeon local LLM testing repo

Reddit r/LocalLLaMA

This repository provides practical testing profiles and benchmarks for running local LLMs on 16GB AMD Radeon GPUs using llama.cpp with ROCm/HIP, focusing on real-world performance metrics like context length and KV cache settings.

Lemonade v10.7 release and project organization update

Reddit r/LocalLLaMA

Lemonade v10.7 release introduces LMX-Omni virtual models for omni-modal chat, a bench CLI tool for LLM performance comparison across backends, and expanded GPU support on AMD, Apple Silicon, Nvidia, and Intel systems.

llama.cpp B9387 Significant AMD/ROCm PP Update

Reddit r/LocalLLaMA

llama.cpp version b9387 introduces MFMA support for AMD CDNA architecture (MI100, MI200, MI300 series), improving processing pipeline performance on datacenter AMD GPUs.