Tag
ICA Lens revives independent component analysis as an efficient method for interpreting language model representations, offering a faster alternative to sparse autoencoder training while maintaining competitive performance.
This paper demonstrates that the structure retention in embedding spaces, measured via nearest-neighbor overlap and ICA differences, strongly correlates with benchmark performance across multiple tasks, offering a predictive metric for model effectiveness.