Tag
Introduces a geometric framework to identify 'AI engrams' – memory traces in deep neural networks – formalizing neuroscientific criteria into a closed-form estimator, enabling surgical memory manipulation in models from MLPs to LLMs.
This paper presents a geometric framework to analyze the instability of feature composition in Sparse Autoencoders, revealing that non-linearities cause a ratchet effect leading to compositional collapse beyond a critical density.