Tensordyne announced a breakthrough inference system using logarithmic math in hardware, claiming 17x more tokens per watt and 13x higher throughput than NVIDIA Blackwell, achieved by replacing complex multiplication with simple addition in log space.
Read their press release here: [Tensordyne Announces Breakthrough Inference System to End AI’s Speed vs. Cost Trade-Off — Tensordyne](https://www.tensordyne.ai/stories/tensordyne-announces-breakthrough-inference-system-to-end-ais-speed-vs-cost-trade-off) The images were taken form their teaser page: [Tensordyne — Inference System](https://www.tensordyne.ai/inference-system) The key math breakthrough they claim to have enabled is efficient log math in hardware. Basically when you act in Log space, multiplications become additions, which are vastly easier to implement in hardware than multiplication circuitry, requiring far less transistors - and thus less space and energy. I asked Claude to give me a little explainer: >**The Core Idea: Logarithmic Number System (LNS)** >The key insight comes from a fundamental property of logarithms: >***log(A × B) = log(A) + log(B)*** >Instead of storing numbers as regular floating-point values, Tensordyne represents them in the logarithmic domain — often log base 2, because that maps naturally to digital hardware. In that representation, multiplication becomes addition: A × B becomes log(A) + log(B). >For hardware, this is a huge deal: adder circuits are far smaller and less power-hungry than multiplier circuits, so this directly reduces chip area and power consumption. >**Why This Matters for AI** >AI, at its core, is matrix math — multiplications and additions. Every time a model generates a token, it performs an enormous number of operations. Traditionally, those are done with floating-point arithmetic (hence the industry term "FLOPs"). But floating-point math is demanding: it burns energy, takes up significant silicon real estate, and drives up system cost. Because AI compute is primarily composed of matrix multiplication, replacing it with log-domain addition radically simplifies the workload, allows the functional units on the chip to be significantly smaller, and frees up more die area for SRAM cache — which improves both performance and core utilization, while also reducing power consumption. >**The Catch: The "Addition Problem"** >AI math isn't just matrix multiplication. It's actually primarily "MAC" (Multiply-Accumulate) instructions — on current GPUs and CPUs, this manifests as "FMA" (Fused Multiply-Add). In other words, it's both a multiplication and an addition. >When you're already in log space, doing a plain addition of two numbers (not a multiplication) is actually the hard part — you can't just add the logs to get the log of a sum. The idea of using LNS math isn't novel — people were experimenting with it as far back as the 1970s, and it has won benchmark prizes and efficiency awards — but it never became mainstream because there was no good way to solve this addition conundrum. >Tensordyne's claim is that they've found a way to handle this efficiently in hardware, which is the key differentiator they don't fully disclose publicly. >**The Hardware Payoff** >By replacing every multiply with lightweight log-math adders, Tensordyne frees up chip compute area compared to today's FP8/INT8 GPUs. Fewer transistors means chips run cooler and more energy-efficiently, and the freed-up die space allows them to pack in extra tensor engines, more high-bandwidth SRAM and HBM3e memory, and a high-speed interconnect fabric. >They also claim that their log math achieves accuracy greater than 99.9% relative to any trained language, vision, or video model — and in some cases even better dynamic range than floating point. >*In short:* it's a clever application of century-old math (logarithms) to a very modern problem. The trick is in solving the addition-in-log-space problem efficiently enough to make it practical — which is where their secret sauce lies.
Tensordyne introduces Napier, an inference system using logarithmic math on silicon, claiming massive efficiency gains for MoE and reasoning models, with air-cooled racks.
Tensordyne announces the Napier AI inference rack, claiming 13x the throughput of Nvidia's NVL72 GB300 by using log-space math to reduce energy and transistor usage, potentially disrupting the inference hardware landscape.
Kog AI launches a tech preview of the Kog Inference Engine, achieving 3,000 tokens/s per request on standard datacenter GPUs by co-designing model architecture, runtime, and low-level GPU code, targeting latency-critical AI agent workflows.
NVIDIA trained a 12-billion parameter LLM in 4-bit precision using the new NVFP4 format with micro-scaling, achieving near-zero intelligence loss while halving memory usage and tripling arithmetic speed, marking a major breakthrough in efficient AI training.
NVIDIA's Blackwell GB300 NVL72 platform leads the first agentic AI infrastructure benchmark, AgentPerf from Artificial Analysis, delivering up to 20x more agents per megawatt than the previous Hopper generation.