rtx-5080

#rtx-5080

RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s, and why MTP doesn't help

Reddit r/LocalLLaMA ↗ · 2026-05-20

Detailed benchmarks of Qwen3.6 35B MoE on RTX 5080 16GB show that MTP (Multi-Token Prediction) does not improve inference speed at 128k context due to VRAM constraints; the best configuration is Q4_K_XL without MTP, achieving ~56 tok/s generation at 128k context.

0 favorites 0 likes

#rtx-5080

It’s Gonna Be May: 16 Games Hit the Cloud This Month, With More NVIDIA GeForce RTX 5080 Power

NVIDIA Blog ↗ · 2026-04-30 Cached

NVIDIA announces 16 games joining GeForce NOW cloud streaming in May, including new AAA titles like Forza Horizon 6 and 007 First Light, and expands RTX 5080-class performance across the library for Ultimate members.

0 favorites 0 likes

rtx-5080

RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s, and why MTP doesn't help

It’s Gonna Be May: 16 Games Hit the Cloud This Month, With More NVIDIA GeForce RTX 5080 Power

Submit Feedback