Tag
A user successfully set up a dual-GPU llama-cpp server with 48GB VRAM using an AMD Radeon PRO and 7800 XT via Vulkan in Docker on Kubuntu 24.04.
A fork of llama.cpp fixes the --split-mode tensor issue with quantized KV caches, achieving up to 40% speed improvement on dual GPU setups without quality loss.