@rohanpaul_ai: Quite a massive inferencing rack breakthrough from @TensordyneInc . They just announced an AI-inference rack, claiming …
Summary
Tensordyne announces the Napier AI inference rack, claiming 13x the throughput of Nvidia's NVL72 GB300 by using log-space math to reduce energy and transistor usage, potentially disrupting the inference hardware landscape.
View Cached Full Text
Cached at: 06/17/26, 07:59 PM
Quite a massive inferencing rack breakthrough from @TensordyneInc .
They just announced an AI-inference rack, claiming 13x the rack throughput of NVIDIA’s NVL72 GB300 in a DeepSeek-R1 comparison based on internal simulations.
What makes this a big deal is that Tensordyne is attacking inference at the math level.
AI chips spend huge amounts of energy moving and multiplying numbers.
Napier (its AI inference racks) works in log space, where multiplication becomes addition, and addition is cheaper to build, switch, cool, and repeat billions of times per token.
So instead of spending tons of transistor budget on heavy multiply circuits, Napier tries to shrink the math itself.
So that means less chip area for compute and more for SRAM, resulting in less power per token and way more inference packed into the same rack.
If they have made log math accurate and fast enough for real inference, then Napier is not just pushing more power into a rack, it is changing the cost of the basic operation behind model serving.
AI inference is no longer just a FLOPS race. It is a rack-level fight over power, memory locality, interconnect latency, and how many paying tokens can be served before the economics break.
They reported their TDN Rack reaches 363,000 tokens per second on DeepSeek-R1 at user speeds of 210 tokens per second per internal simulation, compared with 27,400 tokens per second for Nvidia’s NVL72 GB300.
Similar Articles
@TensordyneInc: https://x.com/TensordyneInc/status/2066567307984531834
Tensordyne introduces Napier, an inference system using logarithmic math on silicon, claiming massive efficiency gains for MoE and reasoning models, with air-cooled racks.
Tensordyne announces Logarithmic AI compute chips. 17x more tokens per watt and 13x higher throughput than NVIDIA Blackwell.
Tensordyne announced a breakthrough inference system using logarithmic math in hardware, claiming 17x more tokens per watt and 13x higher throughput than NVIDIA Blackwell, achieved by replacing complex multiplication with simple addition in log space.
The Inference Shift (8 minute read)
This article analyzes Cerebras' upcoming IPO as a signal of the 'inference shift' in AI hardware, arguing that while Nvidia dominates GPU-based training, the future of AI compute is becoming increasingly heterogeneous to support inference workloads.
@rohanpaul_ai: NVIDIA just posted the first agentic AI benchmark results where GB300 NVL72 runs up to 20x more coding agents per megaw…
NVIDIA published the first agentic AI benchmark results showing the GB300 NVL72 can run up to 20x more coding agents per megawatt than the H200, using the AgentPerf benchmark from Artificial Analysis.
@rohanpaul_ai: I had to test it myself to believe this unreal inference speed. 3,000 tokens/s for 1 user on standard datacenter GPUs. …
Kog AI achieves 3,000 tokens/s inference speed on 8× AMD MI300X GPUs and 2,100 on 8× NVIDIA H200, leveraging a hidden efficiency gap in GPU token generation.