Mimo 2.5 is _fast_ at large context (dual RTX Pro 6000)

Reddit r/LocalLLaMA Models

Summary

Mimo 2.5 demonstrates fast performance with large context windows using dual RTX Pro 6000 GPUs.

No content available
Original Article

Similar Articles

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Reddit r/LocalLLaMA

The author shares detailed tuning tips for running the Qwen3.6-35B-A3B MoE model on an 8GB RTX 3070 Ti with up to 262k context using llama.cpp, achieving 30+ tps, and notes a 25% speed boost when switching from Windows to Ubuntu Server.