@Snixtp: More efficiency tests on a single 3090 TL;DR: - I tested 8 local LLMs on a single RTX 3090, power limit from 100W to 45…

X AI KOLs Following News

Summary

The article presents benchmark results for 8 local LLMs on an RTX 3090, showing that power efficiency peaks around 225W, with diminishing returns at maximum power.

More efficiency tests on a single 3090 TL;DR: - I tested 8 local LLMs on a single RTX 3090, power limit from 100W to 450W. - Across the full set, the best efficiency was consistently around 225-250W. - At 225W, the card averaged 90.4 tok/s at 0.4167 tok/s/W. - At 450W, it reached 107.1 tok/s, but efficiency dropped to 0.2731 tok/s/W. - So max power added only ~16.7 tok/s, while using about 184W more.
Original Article

Similar Articles

Benchmark Qwen 3.6 27B MTP on 2x3090 NVLINK

Reddit r/LocalLLaMA

A benchmark analysis of Qwen 3.6 27B MTP on 4x RTX 3090 GPUs, demonstrating that using NVLink for tensor parallelism yields significant throughput improvements (up to +53%) over PCIe configurations.

RTX Pro 4500 Blackwell - Qwen 3.6 27B?

Reddit r/LocalLLaMA

A developer shares local inference benchmarks and systemd configurations for running the Qwen3.6-27B model on an NVIDIA RTX Pro 4500 Blackwell GPU using llama.cpp. The post requests optimization tips for throughput and explores potential use cases for larger models.

Nvidia RTX 3090 vs Intel Arc Pro B70 llama.cpp Benchmarks

Reddit r/LocalLLaMA

Community benchmark shows Intel Arc Pro B70 averages ~71% slower prompt processing and ~54% slower token generation than RTX 3090 under llama.cpp, with SYCL backend sometimes beating Vulkan on the same card.