Mimo 2.5 is _fast_ at large context (dual RTX Pro 6000)

Reddit r/LocalLLaMA 06/23/26, 10:55 PM Models

Summary

Mimo 2.5 demonstrates fast performance with large context windows using dual RTX Pro 6000 GPUs.

No content available

Original Article

Similar Articles

@0xSero: Minimax-M3 running on 4x RTX Pro 6000s - 800k context - 4x concurrency at 250k - 70-120 tok/s - 2000 tok/s prefill no c…

X AI KOLs Following

Minimax-M3 is demonstrated running on 4x RTX Pro 6000 GPUs with 800k context, achieving 70-120 tok/s inference and 2000 tok/s prefill at 4x concurrency using 376GB VRAM in mxfp4 format.

Mimo V 2.5 and Mimo V 2.5 Pro released.

Reddit r/LocalLLaMA

Mimo V 2.5 and Mimo V 2.5 Pro have been released, offering updated features and improvements.

RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s, and why MTP doesn't help

Reddit r/LocalLLaMA

Detailed benchmarks of Qwen3.6 35B MoE on RTX 5080 16GB show that MTP (Multi-Token Prediction) does not improve inference speed at 128k context due to VRAM constraints; the best configuration is Q4_K_XL without MTP, achieving ~56 tok/s generation at 128k context.

500k context on 48gb VRAM!! - 21tok/s (coding)

Reddit r/LocalLLaMA

A user reports successful deployment of a quantized Nemotron-3 Super model supporting 500k context and agentic coding on consumer-grade dual Titan RTX hardware.

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Reddit r/LocalLLaMA

The author shares detailed tuning tips for running the Qwen3.6-35B-A3B MoE model on an 8GB RTX 3070 Ti with up to 262k context using llama.cpp, achieving 30+ tps, and notes a 25% speed boost when switching from Windows to Ubuntu Server.

Similar Articles

@0xSero: Minimax-M3 running on 4x RTX Pro 6000s - 800k context - 4x concurrency at 250k - 70-120 tok/s - 2000 tok/s prefill no c…

Mimo V 2.5 and Mimo V 2.5 Pro released.

RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s, and why MTP doesn't help

500k context on 48gb VRAM!! - 21tok/s (coding)

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Submit Feedback