vLLM ROCm has been added to Lemonade as an experimental backend

Reddit r/LocalLLaMA 05/08/26, 06:21 PM Tools

vllm rocm backend experimental lemonade llm-inference

Summary

Lemonade has added an experimental ROCm backend for vLLM, allowing users to easily run safetensors LLMs on AMD GPUs with a simple command.

vLLM has the ability to run .safetensors LLMs before they are converted to GGUF and represents a new engine to explore. I personally had never tried it out until u/krishna2910-amd/ u/mikkoph and u/sa1sr1 made it as easy as running llama.cpp in Lemonade: ``` lemonade backends install vllm:rocm lemonade run Qwen3.5-0.8B-vLLM ``` This is an experimental backend for us in the sense that the essentials are implemented, but there are known rough edges. We want the community's feedback to see where and how far we should take this. If you find it interesting, please let us know your thoughts! Quick start guide: https://lemonade-server.ai/news/vllm-rocm.html GitHub: https://github.com/lemonade-sdk/lemonade Discord: https://discord.gg/5xXzkMu8Zk

Original Article

vLLM ROCm has been added to Lemonade as an experimental backend

Similar Articles

ROCm vs Vulkan vs vLLM on Dual R9700's

Turboquant+MTP for ROCm(Llama CPP)

club-rdna16: practical 16GB AMD/Radeon local LLM testing repo

Lemonade v10.7 release and project organization update

llama.cpp B9387 Significant AMD/ROCm PP Update

Submit Feedback

Similar Articles

ROCm vs Vulkan vs vLLM on Dual R9700's

Turboquant+MTP for ROCm(Llama CPP)

club-rdna16: practical 16GB AMD/Radeon local LLM testing repo

Lemonade v10.7 release and project organization update

llama.cpp B9387 Significant AMD/ROCm PP Update