RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8
Summary
A setup using RTX 5080 and RTX 3090 GPUs achieves 80 tokens per second on the Qwen 3.6 27B Q8 model.
Similar Articles
New japanese model on par with frontier american model
A new Japanese AI model achieves performance comparable to leading American frontier models, marking a significant advancement.
Good results fine tuning a local LLM like Qwen 3:0.6B to categorize questions
A developer fine-tunes a small Qwen 3 0.6B model using the Unsloth framework to categorize household questions, achieving good results with only 850 training examples.
@losterror501: with 2dgx sparks getting 25tok/sec with 1 session and it peaks to 152tok/sec with 8 sessions. Actually insane...
Announcement of Qwable-v1, an open-weights model distilled from Claude Fable-5, along with performance benchmarks on 2dgx sparks hardware achieving 25 tok/sec (single session) and 152 tok/sec (8 sessions).
A100 slow Qwen3.6-27B-FP8
The Qwen3.6-27B-FP8 model exhibits slow performance when running on an A100 GPU.
Qwen 27B for planning, Qwen 35B-A3B for execution?
Discusses using Qwen 27B for planning tasks and Qwen 35B-A3B for execution tasks, suggesting a specialized model approach.