@rohanpaul_ai: Quite a massive inferencing rack breakthrough from @TensordyneInc . They just announced an AI-inference rack, claiming …

X AI KOLs Following Products

Summary

Tensordyne announces the Napier AI inference rack, claiming 13x the throughput of Nvidia's NVL72 GB300 by using log-space math to reduce energy and transistor usage, potentially disrupting the inference hardware landscape.

Quite a massive inferencing rack breakthrough from @TensordyneInc . They just announced an AI-inference rack, claiming 13x the rack throughput of NVIDIA’s NVL72 GB300 in a DeepSeek-R1 comparison based on internal simulations. What makes this a big deal is that Tensordyne is attacking inference at the math level. AI chips spend huge amounts of energy moving and multiplying numbers. Napier (its AI inference racks) works in log space, where multiplication becomes addition, and addition is cheaper to build, switch, cool, and repeat billions of times per token. So instead of spending tons of transistor budget on heavy multiply circuits, Napier tries to shrink the math itself. So that means less chip area for compute and more for SRAM, resulting in less power per token and way more inference packed into the same rack. If they have made log math accurate and fast enough for real inference, then Napier is not just pushing more power into a rack, it is changing the cost of the basic operation behind model serving. AI inference is no longer just a FLOPS race. It is a rack-level fight over power, memory locality, interconnect latency, and how many paying tokens can be served before the economics break. They reported their TDN Rack reaches 363,000 tokens per second on DeepSeek-R1 at user speeds of 210 tokens per second per internal simulation, compared with 27,400 tokens per second for Nvidia’s NVL72 GB300. 1.
Original Article
View Cached Full Text

Cached at: 06/17/26, 07:59 PM

Quite a massive inferencing rack breakthrough from @TensordyneInc .

They just announced an AI-inference rack, claiming 13x the rack throughput of NVIDIA’s NVL72 GB300 in a DeepSeek-R1 comparison based on internal simulations.

What makes this a big deal is that Tensordyne is attacking inference at the math level.

AI chips spend huge amounts of energy moving and multiplying numbers.

Napier (its AI inference racks) works in log space, where multiplication becomes addition, and addition is cheaper to build, switch, cool, and repeat billions of times per token.

So instead of spending tons of transistor budget on heavy multiply circuits, Napier tries to shrink the math itself.

So that means less chip area for compute and more for SRAM, resulting in less power per token and way more inference packed into the same rack.

If they have made log math accurate and fast enough for real inference, then Napier is not just pushing more power into a rack, it is changing the cost of the basic operation behind model serving.

AI inference is no longer just a FLOPS race. It is a rack-level fight over power, memory locality, interconnect latency, and how many paying tokens can be served before the economics break.

They reported their TDN Rack reaches 363,000 tokens per second on DeepSeek-R1 at user speeds of 210 tokens per second per internal simulation, compared with 27,400 tokens per second for Nvidia’s NVL72 GB300.

Similar Articles

The Inference Shift (8 minute read)

TLDR AI

This article analyzes Cerebras' upcoming IPO as a signal of the 'inference shift' in AI hardware, arguing that while Nvidia dominates GPU-based training, the future of AI compute is becoming increasingly heterogeneous to support inference workloads.