performance-benchmark

#performance-benchmark

@ivanfioravanti: Interesting video of M5 Max, on impact of Low, Automatic and High power modes on inference. - No external monitor attac…

X AI KOLs Timeline ↗ · 23h ago

A performance test demonstrates the impact of Low, Automatic, and High power modes on LLM inference speed on an M5 Max MacBook, showing significant differences in token generation rates and power consumption.

0 favorites 0 likes

#performance-benchmark

Luce DFlash + PFlash on AMD Strix Halo: Qwen3.6-27B at 2.23x decode and 3.05x prefill vs llama.cpp HIP

Reddit r/LocalLLaMA ↗ · yesterday

Luce releases DFlash and PFlash support for AMD Strix Halo APUs, achieving 2.23x decode and 3.05x prefill speedups over llama.cpp HIP on Qwen3.6-27B.

0 favorites 0 likes

#performance-benchmark

Is HIPfire worth it for Strix Halo?

Reddit r/LocalLLaMA ↗ · 3d ago

The article asks for community evaluations of HIPfire's performance and quality on AMD Strix Halo hardware, specifically regarding long context support compared to llama.cpp.

0 favorites 0 likes

#performance-benchmark

@rumgewieselt: Now its getting crazy ... 3x 1080 Ti (Pascal, 33GB VRAM) Qwen 3.6 27B MTP with 196K TurboQuant ~28-30 t/s consistently

X AI KOLs Timeline ↗ · 5d ago Cached

A user demonstrates successful local inference of a 27B parameter Qwen model across three GTX 1080 Ti GPUs, achieving approximately 28-30 tokens per second using TurboQuant optimization.

0 favorites 0 likes

performance-benchmark

@ivanfioravanti: Interesting video of M5 Max, on impact of Low, Automatic and High power modes on inference. - No external monitor attac…

Luce DFlash + PFlash on AMD Strix Halo: Qwen3.6-27B at 2.23x decode and 3.05x prefill vs llama.cpp HIP

Is HIPfire worth it for Strix Halo?

@rumgewieselt: Now its getting crazy ... 3x 1080 Ti (Pascal, 33GB VRAM) Qwen 3.6 27B MTP with 196K TurboQuant ~28-30 t/s consistently

Submit Feedback