byte-pair-encoding

Tag

Cards List
#byte-pair-encoding

Finding Optimal Tokenizers

Hacker News Top · 2026-06-11 Cached

This blog post presents an algorithm using integer linear programming to compute optimal tokenizers for language models, drawing parallels to solving the Traveling Salesman Problem. It notes that while the result is theoretically interesting, practical tokenizers are already near-optimal and the method may not generalize well.

0 favorites 0 likes
#byte-pair-encoding

Incremental BPE Tokenization

arXiv cs.CL · 2026-06-01 Cached

This paper introduces an incremental algorithm for Byte Pair Encoding (BPE) tokenization that processes each byte in O(log^2 t) time, enabling efficient partial tokenization in streaming settings and achieving speedups over existing implementations.

0 favorites 0 likes
← Back to home

Submit Feedback