Xiaomi & TileRT just hit 1,000+ TPS on a 1-Trillion Parameter model… on standard commodity GPUs. It’s over for custom silicon?

Reddit r/singularity 06/10/26, 04:23 PM News

xiaomi tile-rt trillion-parameter inference throughput gpu custom-silicon

Summary

Xiaomi and TileRT achieved over 1,000 tokens per second inference on a 1-trillion parameter model using standard commodity GPUs, suggesting a major alternative to custom silicon.

No content available

Original Article

Similar Articles

Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server

Reddit r/LocalLLaMA

Xiaomi released MiMo-V2.5-Pro-UltraSpeed in collaboration with TileRT, achieving over 1000 tokens/s decode speed on a 1-trillion-parameter model, enabling real-time AI interaction and accelerating coding agents and reasoning tasks.

China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)

TLDR AI

Xiaomi achieved over 1,000 tokens per second inference on its trillion-parameter MiMo-V2.5-Pro-UltraSpeed model using commodity 8-GPU nodes via FP4 quantization and DFlash speculative decoding, outpacing GPT-5.5 and Claude Opus by over 10x.

@draecomino: Cerebras sets a new record: a one trillion parameter model @ 1,000 tokens/s

X AI KOLs Timeline

Cerebras announces it is running Kimi K2.6, a trillion parameter model, at approximately 1,000 tokens per second in enterprise trials, claiming the fastest frontier model performance ever measured by Artificial Analysis.

@rohanpaul_ai: I had to test it myself to believe this unreal inference speed. 3,000 tokens/s for 1 user on standard datacenter GPUs. …

X AI KOLs Following

Kog AI achieves 3,000 tokens/s inference speed on 8× AMD MI300X GPUs and 2,100 on 8× NVIDIA H200, leveraging a hidden efficiency gap in GPU token generation.

Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec