@tenstorrent: Thank you Tokyo! Here’s everything we announced at TT-Deploy Japan: Faster AI Inference • Kimi K2.6 900 t/s/u, 3x faste…

X AI KOLs Timeline 07/02/26, 10:46 PM Products

ai-inference tenstorrent tt-ascalon risc-v japan-event hardware deepseek

Summary

Tenstorrent announced at TT-Deploy Japan faster AI inference for Kimi K2.6, LTX 2.3, and DeepSeek-R1 on their hardware, plus the licensable TT-Ascalon S RISC-V CPU for agentic AI.

Thank you Tokyo! Here’s everything we announced at TT-Deploy Japan: Faster AI Inference • Kimi K2.6 900 t/s/u, 3x faster than GPUs • LTX 2.3 Fast 6 sec video gen in ~6 sec, 144 frames, 1080p, 4x faster than GPUs • DeepSeek-R1-0528 671B 400+ t/s/u TT-Ascalon S Available Today • A licensable RISC-V CPU built for the next generation of agentic AI applications Heterogenous or Stand Alone • Easily deploy Tenstorrent Galaxy alongside existing infrastructure or standalone • @aiand_'s sovereign heterogenous inference platform with Tenstorrent Galaxy™ superclusters

Original Article

View Cached Full Text

Cached at: 07/03/26, 06:31 AM

Thank you Tokyo! Here’s everything we announced at TT-Deploy Japan:

Faster AI Inference • Kimi K2.6 900 t/s/u, 3x faster than GPUs • LTX 2.3 Fast 6 sec video gen in ~6 sec, 144 frames, 1080p, 4x faster than GPUs • DeepSeek-R1-0528 671B 400+ t/s/u

TT-Ascalon S Available Today • A licensable RISC-V CPU built for the next generation of agentic AI applications

Heterogenous or Stand Alone • Easily deploy Tenstorrent Galaxy alongside existing infrastructure or standalone • @aiand_’s sovereign heterogenous inference platform with Tenstorrent Galaxy™ superclusters

Similar Articles

@HotAisle: Kimi K2.6 + DFlash: 508 tok/s on 8x MI300X 5.6x throughput improvement over baseline autoregressive serving 90 tok/s → …

X AI KOLs Following

Kimi K2.6 paired with DFlash inference system achieves 508 tokens/s on 8×AMD MI300X, a 5.6× throughput jump from 90 tokens/s baseline with zero quality loss.

@gnotuy: We open sourced Kimi K2.6. The next frontier in test-time compute isn't bigger models. It's better organizations of int…

X AI KOLs Following

Moonshot AI has open sourced Kimi K2.6 and argues that the next frontier in test-time compute is better organization of intelligence rather than simply building bigger models.

@YRSM_Simon: This is big news! Kimi 2.6 is a generative-level model. In this age of overflowing LLM capabilities, speed will become the deciding factor in competition. Is the chip sector about to see another 'sector rotation'? 😅

X AI KOLs Following

Cerebras is now running Kimi K2.6, a trillion-parameter model, in enterprise trials at ~1,000 tokens/s, the fastest frontier model performance ever measured by Artificial Analysis.

@QuixiAI: @Kimi_Moonshot K2.6 running on my mi300x, 56 tps (single request). I will run a throughput test

X AI KOLs Following

Kimi K2.6 achieves 56 tokens per second on a single MI300X GPU; user plans further throughput benchmarking.

@songhan_mit: We develop an agent-native approach to accelerate genAI, continuing the success of KDA (Kernel Design Agent) at a highe…

X AI KOLs Following

Enze Xie announces Sol Video Inference Engine, an agent-native, training-free full-stack accelerator for video diffusion that auto-tunes cache, sparse attention, token pruning, quantization, and kernel fusion, achieving >2× end-to-end speedup on large models like 64B Cosmos3-Super and 22B LTX-2.3.

Similar Articles

@HotAisle: Kimi K2.6 + DFlash: 508 tok/s on 8x MI300X 5.6x throughput improvement over baseline autoregressive serving 90 tok/s → …

@gnotuy: We open sourced Kimi K2.6. The next frontier in test-time compute isn't bigger models. It's better organizations of int…

@YRSM_Simon: This is big news! Kimi 2.6 is a generative-level model. In this age of overflowing LLM capabilities, speed will become the deciding factor in competition. Is the chip sector about to see another 'sector rotation'? 😅

@QuixiAI: @Kimi_Moonshot K2.6 running on my mi300x, 56 tps (single request). I will run a throughput test

@songhan_mit: We develop an agent-native approach to accelerate genAI, continuing the success of KDA (Kernel Design Agent) at a highe…

Submit Feedback