Tag
A user reports near-linear performance scaling when adding a second RTX 3090 for inference with a Qwen model, achieving roughly 1.8x decode TPS improvement without NVLink.