@draecomino: Cerebras sets a new record: a one trillion parameter model @ 1,000 tokens/s

X AI KOLs Timeline 05/19/26, 06:23 PM Models

cerebras trillion-parameter kimi-k2-6 inference-speed frontier-model enterprise-trials artificial-analysis

Summary

Cerebras announces it is running Kimi K2.6, a trillion parameter model, at approximately 1,000 tokens per second in enterprise trials, claiming the fastest frontier model performance ever measured by Artificial Analysis.

Cerebras sets a new record: a one trillion parameter model @ 1,000 tokens/s

Original Article

View Cached Full Text

Cached at: 05/19/26, 08:49 PM

Cerebras sets a new record: a one trillion parameter model @ 1,000 tokens/s

Cerebras (@cerebras): Cerebras is now running Kimi K2.6 – a trillion parameter model – in enterprise trials.

At ~1,000 tokens/s, this is the fastest frontier model performance ever measured by Artificial Analysis @ArtificialAnlys.

Similar Articles

@YRSM_Simon: This is big news! Kimi 2.6 is a generative-level model. In this age of overflowing LLM capabilities, speed will become the deciding factor in competition. Is the chip sector about to see another 'sector rotation'? 😅

X AI KOLs Following

Cerebras is now running Kimi K2.6, a trillion-parameter model, in enterprise trials at ~1,000 tokens/s, the fastest frontier model performance ever measured by Artificial Analysis.

Cerebras is now running Kimi K2.6 (1 minute read)

TLDR AI

Cerebras announces that it is now running Kimi K2.6, an AI model from Moonshot AI, on its hardware.

@kirillk_web3: do you understand what Kimi K2.6 just dropped. open-source. free. 1 trillion parameters. here's the part nobody is talk…

X AI KOLs Timeline

Kimi K2.6 is released as a free, open-source 1-trillion parameter model capable of running 300 parallel agents for continuous execution, reportedly outperforming Claude Opus 4.6 on SWE-Bench Pro tasks.

Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec

Reddit r/LocalLLaMA

A community member details a custom PC build using discontinued Intel Optane Persistent Memory to successfully run the 1-trillion parameter Kimi K2.5 model locally at roughly 4 tokens per second via llama.cpp.

Nemotron 3 Ultra. 550 billion parameters, 55B active. 1 million context