performance-benchmark

#performance-benchmark

@KyleHessling1: Made an RTS game with our soon-to-release Qwopus-Coder-35B-A3B entirely in @opencode Thinking enabled, thinking cap set…

X AI KOLs Following ↗ · yesterday Cached

Kyle Hessling announces the upcoming release of the Qwopus-Coder-35B-A3B coding model, demonstrating its capability by using it with OpenCode to develop a fully functional real-time strategy game. The model achieves high speed and draft acceptance on a GeForce RTX 5090.

0 favorites 0 likes

#performance-benchmark

New japanese model on par with frontier american model

Reddit r/singularity ↗ · 6d ago

A new Japanese AI model achieves performance comparable to leading American frontier models, marking a significant advancement.

0 favorites 0 likes

#performance-benchmark

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

Hacker News Top ↗ · 2026-06-13

A setup using RTX 5080 and RTX 3090 GPUs achieves 80 tokens per second on the Qwen 3.6 27B Q8 model.

0 favorites 0 likes

#performance-benchmark

@ivanfioravanti: Interesting video of M5 Max, on impact of Low, Automatic and High power modes on inference. - No external monitor attac…

X AI KOLs Timeline ↗ · 2026-05-12

A performance test demonstrates the impact of Low, Automatic, and High power modes on LLM inference speed on an M5 Max MacBook, showing significant differences in token generation rates and power consumption.

0 favorites 0 likes

#performance-benchmark

Luce DFlash + PFlash on AMD Strix Halo: Qwen3.6-27B at 2.23x decode and 3.05x prefill vs llama.cpp HIP

Reddit r/LocalLLaMA ↗ · 2026-05-12

Luce releases DFlash and PFlash support for AMD Strix Halo APUs, achieving 2.23x decode and 3.05x prefill speedups over llama.cpp HIP on Qwen3.6-27B.

0 favorites 0 likes

#performance-benchmark

Is HIPfire worth it for Strix Halo?

Reddit r/LocalLLaMA ↗ · 2026-05-10

The article asks for community evaluations of HIPfire's performance and quality on AMD Strix Halo hardware, specifically regarding long context support compared to llama.cpp.

0 favorites 0 likes

#performance-benchmark

@rumgewieselt: Now its getting crazy ... 3x 1080 Ti (Pascal, 33GB VRAM) Qwen 3.6 27B MTP with 196K TurboQuant ~28-30 t/s consistently

X AI KOLs Timeline ↗ · 2026-05-08 Cached

A user demonstrates successful local inference of a 27B parameter Qwen model across three GTX 1080 Ti GPUs, achieving approximately 28-30 tokens per second using TurboQuant optimization.

0 favorites 0 likes

performance-benchmark

@KyleHessling1: Made an RTS game with our soon-to-release Qwopus-Coder-35B-A3B entirely in @opencode Thinking enabled, thinking cap set…

New japanese model on par with frontier american model

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

@ivanfioravanti: Interesting video of M5 Max, on impact of Low, Automatic and High power modes on inference. - No external monitor attac…

Luce DFlash + PFlash on AMD Strix Halo: Qwen3.6-27B at 2.23x decode and 3.05x prefill vs llama.cpp HIP

Is HIPfire worth it for Strix Halo?

@rumgewieselt: Now its getting crazy ... 3x 1080 Ti (Pascal, 33GB VRAM) Qwen 3.6 27B MTP with 196K TurboQuant ~28-30 t/s consistently

Submit Feedback