tile-rt

#tile-rt

Xiaomi & TileRT just hit 1,000+ TPS on a 1-Trillion Parameter model… on standard commodity GPUs. It’s over for custom silicon?

Reddit r/singularity ↗ · 2026-06-10

Xiaomi and TileRT achieved over 1,000 tokens per second inference on a 1-trillion parameter model using standard commodity GPUs, suggesting a major alternative to custom silicon.

0 favorites 0 likes

#tile-rt

China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)

TLDR AI ↗ · 2026-06-09 Cached

Xiaomi achieved over 1,000 tokens per second inference on its trillion-parameter MiMo-V2.5-Pro-UltraSpeed model using commodity 8-GPU nodes via FP4 quantization and DFlash speculative decoding, outpacing GPT-5.5 and Claude Opus by over 10x.

0 favorites 0 likes

#tile-rt

@zephyr_z9: This is super big I think this is the first useful speculative decoding method deployed on a big quasi frontier model M…

X AI KOLs Following ↗ · 2026-06-08 Cached

Xiaomi MiMo releases MiMo-V2.5-Pro-UltraSpeed, achieving over 1,000 tokens per second on a 1 trillion parameter model using speculative decoding, the first practical deployment of such speed at scale.

0 favorites 0 likes

#tile-rt

Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server

Reddit r/LocalLLaMA ↗ · 2026-06-08 Cached

Xiaomi released MiMo-V2.5-Pro-UltraSpeed in collaboration with TileRT, achieving over 1000 tokens/s decode speed on a 1-trillion-parameter model, enabling real-time AI interaction and accelerating coding agents and reasoning tasks.

0 favorites 0 likes

tile-rt

Xiaomi & TileRT just hit 1,000+ TPS on a 1-Trillion Parameter model… on standard commodity GPUs. It’s over for custom silicon?

China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)

@zephyr_z9: This is super big I think this is the first useful speculative decoding method deployed on a big quasi frontier model M…

Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server

Submit Feedback