magnitude-encoding

Tag

Cards List
#magnitude-encoding

A Triadic Suffix Tokenization Scheme for Numerical Reasoning

arXiv cs.CL · 2026-04-20 Cached

This paper introduces Triadic Suffix Tokenization (TST), a deterministic tokenization scheme that partitions digits into three-digit triads with explicit magnitude markers to improve numerical reasoning in large language models. The method addresses inconsistent number fragmentation in standard tokenizers by providing transparent order-of-magnitude relationships at the token level, with two implementation variants offering scalable vocabulary expansion.

0 favorites 0 likes
← Back to home

Submit Feedback