quicktok: a faster tokenizer (exact and byte-identical with tiktoken) [P]

Reddit r/MachineLearning 06/16/26, 04:24 AM Tools

tokenizer bpe c-plus-plus performance open-source tiktoken speed-up

Summary

quicktok is a fast and exact BPE tokenizer in C++ that is byte-identical with tiktoken, achieving 2–11x speedup over existing alternatives. It supports cl100k, o200k, GPT-OSS, Llama-3, and Qwen2.5/3 encoders.

Been working on this a while! Should be useful for anyone trying to speed up their tokenization workflows. **quicktok** is a fast/exact BPE tokenizer written in C++. Token ids are byte-identical to `tiktoken` and encoding runs **2–3.6×** faster than `bpe-openai` (the fastest alternative I know of) and **4–11×** faster than `tiktoken` itself. It ships cl100k, o200k, GPT-OSS, Llama-3, and Qwen2.5/3. **Approach.** Same algorithm as `bpe-openai` (exact backtracking BPE) but I apply lots of data structure engineering to cut memory accesses: * A 2-byte trie is used for the longest-match walk * Dense exactly-keyed caches are used for merge-validity checks * A hand-compiled pretokenizer is used instead of a general regex engine **Benchmarks** (Apple M1, single thread, MB/s, cl100k\_base and every output verified token-for-token before timing): |encoder|The Pile|Code|Common Crawl| |:-|:-|:-|:-| |**quicktok (native)**|**121.7**|**139.2**|**71.3**| |**quicktok (Python)**|**77.9**|**83.6**|**49.7**| |bpe-openai|36.6|38.7|28.9| |rs-bpe|30.9|34.7|23.5| |tiktoken-rs|15.4|13.8|13.3| |tiktoken (Python)|13.6|12.8|12.3| |TokenDagger|11.1|11.9|10.7| o200k\_base is similar in ratios. Each encoder is called through its own raw API and benchmarks can be reproduced with `make bench-compare` in the repo. `pip install quicktok-v1` Repo: [https://github.com/dmatth1/quicktok](https://github.com/dmatth1/quicktok)

Original Article

quicktok: a faster tokenizer (exact and byte-identical with tiktoken) [P]

Similar Articles

ztok — a fast multithreaded tokenizer in Zig that loads tiktoken / HF / SentencePiece and is 2–5× faster

Incremental BPE Tokenization

@no_stp_on_snek: @antirez Turbo3 BEATS fp8 by +5% decode tok/s at 32K context still tinkering but i've been cooking TQ+ in your kitchen

80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP

Qwen3.6 27B Pure Quant: 40 tok/s on 16 GB VRAM

Submit Feedback

Similar Articles

ztok — a fast multithreaded tokenizer in Zig that loads tiktoken / HF / SentencePiece and is 2–5× faster

@no_stp_on_snek: @antirez Turbo3 BEATS fp8 by +5% decode tok/s at 32K context still tinkering but i've been cooking TQ+ in your kitchen
Turbo3 achieves 5% faster decode tokens per second compared to fp8 at 32K context, a performance improvement in quantization or model optimization.

80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP

Qwen3.6 27B Pure Quant: 40 tok/s on 16 GB VRAM