RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

Hacker News Top 06/13/26, 09:55 AM Tools

rtx-5080 rtx-3090 qwen tokens-per-second performance-benchmark gpu-setup ai-inference

Summary

A setup using RTX 5080 and RTX 3090 GPUs achieves 80 tokens per second on the Qwen 3.6 27B Q8 model.

No content available

Original Article

Similar Articles

New japanese model on par with frontier american model

Reddit r/singularity

A new Japanese AI model achieves performance comparable to leading American frontier models, marking a significant advancement.

Good results fine tuning a local LLM like Qwen 3:0.6B to categorize questions

Hacker News Top

A developer fine-tunes a small Qwen 3 0.6B model using the Unsloth framework to categorize household questions, achieving good results with only 850 training examples.

@losterror501: with 2dgx sparks getting 25tok/sec with 1 session and it peaks to 152tok/sec with 8 sessions. Actually insane...

X AI KOLs Timeline

Announcement of Qwable-v1, an open-weights model distilled from Claude Fable-5, along with performance benchmarks on 2dgx sparks hardware achieving 25 tok/sec (single session) and 152 tok/sec (8 sessions).

A100 slow Qwen3.6-27B-FP8

Reddit r/LocalLLaMA

The Qwen3.6-27B-FP8 model exhibits slow performance when running on an A100 GPU.

Qwen 27B for planning, Qwen 35B-A3B for execution?