vocabulary-expansion

Tag

Cards List
#vocabulary-expansion

Defragmenting Language Models: An Interpretability-based Approach for Vocabulary Expansion

arXiv cs.CL · 2026-04-21 Cached

Researchers from University of Utah and CMU propose FragMend, an interpretability-based approach for vocabulary expansion in LLMs that addresses token over-fragmentation in non-Latin script languages. Their method outperforms frequency-based vocabulary selection and baseline embedding initialization by ~20 points for several underrepresented languages.

0 favorites 0 likes
← Back to home

Submit Feedback