Tag
A user discovered that a hidden PCIe 2.0 x4 electrical limitation on a Threadripper workstation board was crippling one of four RTX 3090s, causing poor multi-GPU LLM inference performance. Fixing the slot layout and switching to tensor split mode doubled Mistral 128B throughput from ~11 to ~24.7 tok/s.
A developer documents the extensive hardware and firmware hacking required to run an NVIDIA RTX Pro 6000 Blackwell GPU in a legacy Dell PowerEdge R730 server, achieving 650K context length for local AI inference.
AMD is set to release new slottable PCIe-based Instinct GPUs aimed at the enterprise AI market, offering a potential new hardware option for local LLM deployment.