Xiaomi & TileRT just hit 1,000+ TPS on a 1-Trillion Parameter model… on standard commodity GPUs. It’s over for custom silicon?
Summary
Xiaomi and TileRT achieved over 1,000 tokens per second inference on a 1-trillion parameter model using standard commodity GPUs, suggesting a major alternative to custom silicon.
Similar Articles
Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server
Xiaomi released MiMo-V2.5-Pro-UltraSpeed in collaboration with TileRT, achieving over 1000 tokens/s decode speed on a 1-trillion-parameter model, enabling real-time AI interaction and accelerating coding agents and reasoning tasks.
China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)
Xiaomi achieved over 1,000 tokens per second inference on its trillion-parameter MiMo-V2.5-Pro-UltraSpeed model using commodity 8-GPU nodes via FP4 quantization and DFlash speculative decoding, outpacing GPT-5.5 and Claude Opus by over 10x.
@draecomino: Cerebras sets a new record: a one trillion parameter model @ 1,000 tokens/s
Cerebras announces it is running Kimi K2.6, a trillion parameter model, at approximately 1,000 tokens per second in enterprise trials, claiming the fastest frontier model performance ever measured by Artificial Analysis.
@rohanpaul_ai: I had to test it myself to believe this unreal inference speed. 3,000 tokens/s for 1 user on standard datacenter GPUs. …
Kog AI achieves 3,000 tokens/s inference speed on 8× AMD MI300X GPUs and 2,100 on 8× NVIDIA H200, leveraging a hidden efficiency gap in GPU token generation.
Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec
A community member details a custom PC build using discontinued Intel Optane Persistent Memory to successfully run the 1-trillion parameter Kimi K2.5 model locally at roughly 4 tokens per second via llama.cpp.