serbian-digraphia

#serbian-digraphia

How Far Do Auto-Interpretation Labels Generalize: A Controlled Study Across Languages, Scripts, and Rewordings

arXiv cs.CL ↗ · 3d ago Cached

This paper investigates whether auto-generated labels for sparse autoencoder features generalize across languages and scripts, using Serbian digraphia as a controlled testbed. It finds that while feature sets show substantial overlap across languages, the labels often fail to track the same concept in non-English inputs, particularly in less represented scripts.

0 favorites 0 likes

serbian-digraphia

How Far Do Auto-Interpretation Labels Generalize: A Controlled Study Across Languages, Scripts, and Rewordings

Submit Feedback