Tag
This paper introduces generic triple-latent recurrent models that compress token pair interactions into a latent state, and a gated associative retrieval variant that improves exact recall. The hybrid model outperforms Transformers on byte-level WikiText-2 and a tokenized language benchmark, achieving up to 41.9% associative recall versus 25%.
WriteSAE introduces the first sparse autoencoder that decomposes matrix cache writes in state-space and hybrid recurrent language models, enabling superior token-level interventions compared to existing methods.
This paper argues that robust state tracking in recurrent models depends on error control dynamics rather than just expressive capacity, proving that affine recurrent networks suffer from accumulating errors that limit their effective horizon.