RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8
Summary
A setup using RTX 5080 and RTX 3090 GPUs achieves 80 tokens per second on the Qwen 3.6 27B Q8 model.
Similar Articles
Good results fine tuning a local LLM like Qwen 3:0.6B to categorize questions
A developer fine-tunes a small Qwen 3 0.6B model using the Unsloth framework to categorize household questions, achieving good results with only 850 training examples.
@losterror501: with 2dgx sparks getting 25tok/sec with 1 session and it peaks to 152tok/sec with 8 sessions. Actually insane...
Announcement of Qwable-v1, an open-weights model distilled from Claude Fable-5, along with performance benchmarks on 2dgx sparks hardware achieving 25 tok/sec (single session) and 152 tok/sec (8 sessions).
A100 slow Qwen3.6-27B-FP8
The Qwen3.6-27B-FP8 model exhibits slow performance when running on an A100 GPU.
Qwen 27B for planning, Qwen 35B-A3B for execution?
Discusses using Qwen 27B for planning tasks and Qwen 35B-A3B for execution tasks, suggesting a specialized model approach.
Best local model for vision - 2nd benchmark update - 21 Jun 2026
This post presents the second update of a benchmark for local vision language models, comparing 23 models across 30 images with revised settings, and provides performance recommendations for different VRAM tiers. Key findings include that thinking mode hurts vision performance and that MoE models underperform dense models for perception tasks.