compositional-understanding

#compositional-understanding

Emergent retokenization symmetry in large language models: phenomenology and applications

arXiv cs.CL ↗ · 4d ago Cached

This paper discovers that large language models partially exhibit emergent symmetry under retokenization—replacing a prompt's canonical tokenization with an alternative valid segmentation while preserving bytes exactly. The authors use this phenomenon to probe compositional understanding and propose retokenization as a novel inference-time sampling strategy that can recover solutions not found by conventional temperature sampling.

0 favorites 0 likes

compositional-understanding

Emergent retokenization symmetry in large language models: phenomenology and applications

Submit Feedback