Tag
Xiaomi and TileRT achieved over 1,000 tokens per second inference on a 1-trillion parameter model using standard commodity GPUs, suggesting a major alternative to custom silicon.
Xiaomi achieved over 1,000 tokens per second inference on its trillion-parameter MiMo-V2.5-Pro-UltraSpeed model using commodity 8-GPU nodes via FP4 quantization and DFlash speculative decoding, outpacing GPT-5.5 and Claude Opus by over 10x.
Xiaomi MiMo releases MiMo-V2.5-Pro-UltraSpeed, achieving over 1,000 tokens per second on a 1 trillion parameter model using speculative decoding, the first practical deployment of such speed at scale.
Xiaomi released MiMo-V2.5-Pro-UltraSpeed in collaboration with TileRT, achieving over 1000 tokens/s decode speed on a 1-trillion-parameter model, enabling real-time AI interaction and accelerating coding agents and reasoning tasks.