Tag
Detailed benchmarks of Qwen3.6 35B MoE on RTX 5080 16GB show that MTP (Multi-Token Prediction) does not improve inference speed at 128k context due to VRAM constraints; the best configuration is Q4_K_XL without MTP, achieving ~56 tok/s generation at 128k context.
NVIDIA announces 16 games joining GeForce NOW cloud streaming in May, including new AAA titles like Forza Horizon 6 and 007 First Light, and expands RTX 5080-class performance across the library for Ultimate members.