Tag
quicktok is a fast and exact BPE tokenizer in C++ that is byte-identical with tiktoken, achieving 2–11x speedup over existing alternatives. It supports cl100k, o200k, GPT-OSS, Llama-3, and Qwen2.5/3 encoders.
llama.cpp natively supports Multi-Token Prediction (MTP) without requiring an extra draft model. By leveraging the model's built-in prediction head, local models like Qwen3.6-27B achieve 1.7x+ speedup, making 27B models run smoothly on consumer GPUs.
A tweet recommending --ddtree-budget 36 for Nvidia RTX 4090, claiming 2.5x speedup during decoding for Qwen3.6_27B.