@Snixtp: More efficiency tests on a single 3090 TL;DR: - I tested 8 local LLMs on a single RTX 3090, power limit from 100W to 45…

X AI KOLs Following 05/08/26, 11:02 PM News

Summary

The article presents benchmark results for 8 local LLMs on an RTX 3090, showing that power efficiency peaks around 225W, with diminishing returns at maximum power.

More efficiency tests on a single 3090 TL;DR: - I tested 8 local LLMs on a single RTX 3090, power limit from 100W to 450W. - Across the full set, the best efficiency was consistently around 225-250W. - At 225W, the card averaged 90.4 tok/s at 0.4167 tok/s/W. - At 450W, it reached 107.1 tok/s, but efficiency dropped to 0.2731 tok/s/W. - So max power added only ~16.7 tok/s, while using about 184W more.

Original Article

Similar Articles

Benchmark Qwen 3.6 27B MTP on 2x3090 NVLINK

Reddit r/LocalLLaMA

A benchmark analysis of Qwen 3.6 27B MTP on 4x RTX 3090 GPUs, demonstrating that using NVLink for tensor parallelism yields significant throughput improvements (up to +53%) over PCIe configurations.

RTX Pro 4500 Blackwell - Qwen 3.6 27B?

Reddit r/LocalLLaMA

A developer shares local inference benchmarks and systemd configurations for running the Qwen3.6-27B model on an NVIDIA RTX Pro 4500 Blackwell GPU using llama.cpp. The post requests optimization tips for throughput and explores potential use cases for larger models.

Nvidia RTX 3090 vs Intel Arc Pro B70 llama.cpp Benchmarks

Reddit r/LocalLLaMA

Community benchmark shows Intel Arc Pro B70 averages ~71% slower prompt processing and ~54% slower token generation than RTX 3090 under llama.cpp, with SYCL backend sometimes beating Vulkan on the same card.

Qwen3.5-27B, Qwen3.5-122B, and Qwen3.6-35B on 4x RTX 3090 — MoEs struggle with strict global rules

Reddit r/LocalLLaMA

A user benchmarks three Qwen models (Qwen3.5-27B dense, Qwen3.5-122B-A10B MoE, Qwen3.6-35B-A3B MoE) on 4x RTX 3090 GPUs under real agentic workloads, finding that MoE models consistently underperform the dense 27B at following strict global rules despite speed advantages, with the Qwen3.6-35B leading in generation throughput.

@seclink: Just hit 134 tok/s with Qwen 3.5-27B Dense and 73 tok/s with the new Qwen 3.6-27B on a single RTX 3090. The 2026 open-source scene is moving at lightspeed…

X AI KOLs Following

A single RTX 3090 pushes 134 tok/s on the fresh 27B Qwen 3.5 Dense and 73 tok/s on Qwen 3.6-27B via fused kernels plus speculative decoding, with GGUF drops the same evening.

Similar Articles

Benchmark Qwen 3.6 27B MTP on 2x3090 NVLINK

RTX Pro 4500 Blackwell - Qwen 3.6 27B?

Nvidia RTX 3090 vs Intel Arc Pro B70 llama.cpp Benchmarks

Qwen3.5-27B, Qwen3.5-122B, and Qwen3.6-35B on 4x RTX 3090 — MoEs struggle with strict global rules

@seclink: Just hit 134 tok/s with Qwen 3.5-27B Dense and 73 tok/s with the new Qwen 3.6-27B on a single RTX 3090. The 2026 open-source scene is moving at lightspeed…

Submit Feedback