@tenstorrent: Thank you Tokyo! Here’s everything we announced at TT-Deploy Japan: Faster AI Inference • Kimi K2.6 900 t/s/u, 3x faste…
Summary
Tenstorrent announced at TT-Deploy Japan faster AI inference for Kimi K2.6, LTX 2.3, and DeepSeek-R1 on their hardware, plus the licensable TT-Ascalon S RISC-V CPU for agentic AI.
View Cached Full Text
Cached at: 07/03/26, 06:31 AM
Thank you Tokyo! Here’s everything we announced at TT-Deploy Japan:
Faster AI Inference • Kimi K2.6 900 t/s/u, 3x faster than GPUs • LTX 2.3 Fast 6 sec video gen in ~6 sec, 144 frames, 1080p, 4x faster than GPUs • DeepSeek-R1-0528 671B 400+ t/s/u
TT-Ascalon S Available Today • A licensable RISC-V CPU built for the next generation of agentic AI applications
Heterogenous or Stand Alone • Easily deploy Tenstorrent Galaxy alongside existing infrastructure or standalone • @aiand_’s sovereign heterogenous inference platform with Tenstorrent Galaxy™ superclusters
Similar Articles
@HotAisle: Kimi K2.6 + DFlash: 508 tok/s on 8x MI300X 5.6x throughput improvement over baseline autoregressive serving 90 tok/s → …
Kimi K2.6 paired with DFlash inference system achieves 508 tokens/s on 8×AMD MI300X, a 5.6× throughput jump from 90 tokens/s baseline with zero quality loss.
@gnotuy: We open sourced Kimi K2.6. The next frontier in test-time compute isn't bigger models. It's better organizations of int…
Moonshot AI has open sourced Kimi K2.6 and argues that the next frontier in test-time compute is better organization of intelligence rather than simply building bigger models.
@YRSM_Simon: This is big news! Kimi 2.6 is a generative-level model. In this age of overflowing LLM capabilities, speed will become the deciding factor in competition. Is the chip sector about to see another 'sector rotation'? 😅
Cerebras is now running Kimi K2.6, a trillion-parameter model, in enterprise trials at ~1,000 tokens/s, the fastest frontier model performance ever measured by Artificial Analysis.
@QuixiAI: @Kimi_Moonshot K2.6 running on my mi300x, 56 tps (single request). I will run a throughput test
Kimi K2.6 achieves 56 tokens per second on a single MI300X GPU; user plans further throughput benchmarking.
@songhan_mit: We develop an agent-native approach to accelerate genAI, continuing the success of KDA (Kernel Design Agent) at a highe…
Enze Xie announces Sol Video Inference Engine, an agent-native, training-free full-stack accelerator for video diffusion that auto-tunes cache, sparse attention, token pruning, quantization, and kernel fusion, achieving >2× end-to-end speedup on large models like 64B Cosmos3-Super and 22B LTX-2.3.