How fast is N tokens per second really?
Summary
A web tool that lets users visually experience different LLM token generation rates (e.g., 5–800 tok/s) across code, text, reasoning, and agent modes, helping internalize performance numbers from benchmarks.
View Cached Full Text
Cached at: 05/20/26, 05:28 PM
Similar Articles
Getting a feel for how fast X tokens/second really is.
The author introduces a web-based script designed to help users intuitively understand token-per-second speeds in local LLM setups by simulating text, code, and reasoning generation rates.
How fast is 10 tokens per second really?
Simon Willison explores the practical meaning of 10 tokens per second speed for large language models, offering context on how fast that feels and its implications for usability.
TokenSpeed: A Speed-of-Light LLM Inference Engine for Agentic Workloads (5 minute read)
Lightseek releases TokenSpeed, a high-performance LLM inference engine optimized for agentic workloads, featuring compiler-backed parallelism and advanced kernel optimizations that have been adopted by vLLM.
80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP
A user shares a configuration for achieving over 80 tokens per second with Qwen3.6 35B A3B on a 12GB VRAM GPU using llama.cpp and Multi-Token Prediction (MTP). The post includes benchmark results and specific command-line parameters to optimize performance.
@charles_irl: Added a fun lil widget to the LLM Engineer's Almanac -- a "Token Timing Simulator" so you can get a visceral feel for w…
A token timing simulator widget was added to the LLM Engineer's Almanac, demonstrating the DFlash technique achieving ~1k TPS, to help users viscerally understand benchmark performance numbers.