How fast is 10 tokens per second really?

Simon Willison's Blog News

Summary

Simon Willison explores the practical meaning of 10 tokens per second speed for large language models, offering context on how fast that feels and its implications for usability.

No content available
Original Article
View Cached Full Text

Cached at: 05/20/26, 06:38 PM

# How fast is 10 tokens per second really? Source: [https://simonwillison.net/2026/May/20/tokens-per-second/](https://simonwillison.net/2026/May/20/tokens-per-second/) This is a**link post**by Simon Willison, posted on[20th May 2026](https://simonwillison.net/2026/May/20/)\. [ai2028](https://simonwillison.net/tags/ai/)[generative\-ai1795](https://simonwillison.net/tags/generative-ai/)[llms1761](https://simonwillison.net/tags/llms/) ### Monthly briefing Sponsor me for**$10/month**and get a curated email digest of the month's most important LLM developments\. Pay me to send you less\! [Sponsor & subscribe](https://github.com/sponsors/simonw/)

Similar Articles

How fast is N tokens per second really?

Hacker News Top

A web tool that lets users visually experience different LLM token generation rates (e.g., 5–800 tok/s) across code, text, reasoning, and agent modes, helping internalize performance numbers from benchmarks.

Token maxxing

Reddit r/singularity

Discusses strategies and techniques for maximizing token usage in large language models to improve efficiency and output quality.

Compute Optimal Tokenization (2 minute read)

TLDR AI

This paper systematically derives compression-aware neural scaling laws by training nearly 1,300 models, demonstrating that the widely used heuristic of 20 tokens per parameter is an artifact of specific tokenizers. The authors propose a tokenizer-agnostic scaling law based on bytes, offering a new framework for compute-efficient training across diverse languages and modalities.