@rohanpaul_ai: NVIDIA just posted the first agentic AI benchmark results where GB300 NVL72 runs up to 20x more coding agents per megaw…

X AI KOLs Following News

Summary

NVIDIA published the first agentic AI benchmark results showing the GB300 NVL72 can run up to 20x more coding agents per megawatt than the H200, using the AgentPerf benchmark from Artificial Analysis.

NVIDIA just posted the first agentic AI benchmark results where GB300 NVL72 runs up to 20x more coding agents per megawatt than H200. Older inference benchmarks mostly ask how fast a system can produce tokens after one prompt. AgentPerf from Artificial Analysis, asks a harder question: how many agents can run at the same time while still feeling responsive. It tests a harder workload than normal LLM serving because an agent is not one request and one answer, but a long chain of model calls, code edits, command runs, tool delays, and growing context. The benchmark replays real coding-agent paths from public repos across 12+ programming languages, with request lengths from 5K to 131K tokens and an average near 27K tokens. NVIDIA says GB300 NVL72 reaches 61.4K concurrent agents per megawatt at the lowest service tier, while H200 reaches 2.6K. The gain comes from 72 GPUs acting like one rack-scale machine through NVLink, plus software that spreads MoE expert work, overlaps communication with compute, and keeps batches large. @NVIDIAAIDev
Original Article
View Cached Full Text

Cached at: 06/13/26, 01:04 AM

NVIDIA just posted the first agentic AI benchmark results where GB300 NVL72 runs up to 20x more coding agents per megawatt than H200.

Older inference benchmarks mostly ask how fast a system can produce tokens after one prompt.

AgentPerf from Artificial Analysis, asks a harder question: how many agents can run at the same time while still feeling responsive.

It tests a harder workload than normal LLM serving because an agent is not one request and one answer, but a long chain of model calls, code edits, command runs, tool delays, and growing context.

The benchmark replays real coding-agent paths from public repos across 12+ programming languages, with request lengths from 5K to 131K tokens and an average near 27K tokens.

NVIDIA says GB300 NVL72 reaches 61.4K concurrent agents per megawatt at the lowest service tier, while H200 reaches 2.6K.

The gain comes from 72 GPUs acting like one rack-scale machine through NVLink, plus software that spreads MoE expert work, overlaps communication with compute, and keeps batches large.

@NVIDIAAIDev

Similar Articles