How fast is N tokens per second really?

Hacker News Top 05/18/26, 02:04 AM Tools

llm-throughput token-speed developer-tool visualization local-llm performance

Summary

A web tool that lets users visually experience different LLM token generation rates (e.g., 5–800 tok/s) across code, text, reasoning, and agent modes, helping internalize performance numbers from benchmarks.

No content available

Original Article

View Cached Full Text

Cached at: 05/20/26, 05:28 PM

# tokenspeed — feel LLM tokens-per-second Source: [https://mikeveerman.github.io/tokenspeed/](https://mikeveerman.github.io/tokenspeed/) Every local\-LLM benchmark reports throughput:*"47 tok/s on an M3,"**"180 tok/s on a 4090,"**"500 tok/s on Groq\."*Unless you've actually watched tokens stream at those rates, the numbers are hard to internalize\. This is the rendering\. ### Four modes - **code**— syntax\-highlighted pseudo\-code, the most common thing you watch stream out of an LLM\. - **text**— lorem ipsum prose, for the chat/answer case\. - **think**— dim\-italic reasoning sentences alternating with code, mimicking a reasoning model thinking out loud\. - **agent**— alternating tool calls and code generation with processing pauses, simulating an AI coding agent\. ### What to try Start at the default30and read along\. Then hit1\(5 tok/s — Raspberry\-Pi\-class local model\),5\(60 tok/s — typical hosted Claude or GPT\),7\(200 tok/s — Groq territory\),9\(800 tok/s — Cerebras\-class, where the bottleneck is your eyeballs\)\. Now switch betweencandtat the same rate\. The difference is striking — and intentional\. ### What counts as a token This approximates BPE\-style tokenization, not any vendor\-specific encoder \(`tiktoken`, Claude's tokenizer, etc\. — those disagree in the details anyway\)\. Short words are often one token; longer identifiers split into chunks \(`processUserInput`→`process`\+`User`\+`Input`\); punctuation and operators usually count too\. Code is more token\-dense than prose, so the same tok/s can feel very different depending on what's streaming\. The benchmark number is honest; the perceptual effect varies a lot by content type — which is the gap this tool exists to expose\. English prose averages ~1\.3 tokens per word, so 30 tok/s ≈ 23 words/s\.

How fast is N tokens per second really?

Similar Articles

Getting a feel for how fast X tokens/second really is.

How fast is 10 tokens per second really?

TokenSpeed: A Speed-of-Light LLM Inference Engine for Agentic Workloads (5 minute read)

80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP

@charles_irl: Added a fun lil widget to the LLM Engineer's Almanac -- a "Token Timing Simulator" so you can get a visceral feel for w…

Submit Feedback

Similar Articles

Getting a feel for how fast X tokens/second really is.

How fast is 10 tokens per second really?

TokenSpeed: A Speed-of-Light LLM Inference Engine for Agentic Workloads (5 minute read)

80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP

@charles_irl: Added a fun lil widget to the LLM Engineer's Almanac -- a "Token Timing Simulator" so you can get a visceral feel for w…