RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

Hacker News Top Tools

Summary

A setup using RTX 5080 and RTX 3090 GPUs achieves 80 tokens per second on the Qwen 3.6 27B Q8 model.

No content available
Original Article

Similar Articles

A100 slow Qwen3.6-27B-FP8

Reddit r/LocalLLaMA

The Qwen3.6-27B-FP8 model exhibits slow performance when running on an A100 GPU.

Best local model for vision - 2nd benchmark update - 21 Jun 2026

Reddit r/LocalLLaMA

This post presents the second update of a benchmark for local vision language models, comparing 23 models across 30 images with revised settings, and provides performance recommendations for different VRAM tiers. Key findings include that thinking mode hurts vision performance and that MoE models underperform dense models for perception tasks.