WriteSAE: Sparse Autoencoders for Recurrent State

Hugging Face Daily Papers 05/12/26, 12:00 AM Papers

Summary

WriteSAE introduces the first sparse autoencoder that decomposes matrix cache writes in state-space and hybrid recurrent language models, enabling superior token-level interventions compared to existing methods.

We introduce WriteSAE, the first sparse autoencoder that decomposes and edits the matrix cache write of state-space and hybrid recurrent language models, where residual SAEs cannot reach. Existing SAEs read residual streams, but Gated DeltaNet, Mamba-2, and RWKV-7 write to a d_k times d_v cache through rank-1 updates k_t v_t^top that no vector atom can replace. WriteSAE factors each decoder atom into the native write shape, exposes a closed form for the per-token logit shift, and trains under matched Frobenius norm so atoms swap one cache slot at a time. Atom substitution beats matched-norm ablation on 92.4% of n=4{,}851 firings at Qwen3.5-0.8B L9 H4, the 87-atom population test holds at 89.8%, the closed form predicts measured effects at R^2=0.98, and Mamba-2-370M substitutes at 88.1% over 2,500 firings. Sustained three-position installs at 3times lift midrank target-in-continuation from 33.3% to 100% under greedy decoding, the first behavioral install at the matrix-recurrent write site.

Original Article

View Cached Full Text

Cached at: 05/14/26, 04:17 AM

Paper page - WriteSAE: Sparse Autoencoders for Recurrent State

Source: https://huggingface.co/papers/2605.12770

Abstract

WriteSAE enables sparse autoencoder decomposition and editing of matrix cache writes in state-space and hybrid recurrent language models, achieving superior performance in token-level interventions compared to existing methods.

We introduce WriteSAE, the firstsparse autoencoderthat decomposes and edits thematrix cache writeof state-space andhybrid recurrent language models, whereresidual SAEscannot reach. Existing SAEs read residual streams, butGated DeltaNet,Mamba-2, andRWKV-7write to a d_k times d_v cache throughrank-1 updatesk_t v_t^top that no vector atom can replace. WriteSAE factors each decoder atom into the native write shape, exposes a closed form for the per-token logit shift, and trains under matchedFrobenius normso atoms swap one cache slot at a time.Atom substitutionbeats matched-norm ablation on 92.4% of n=4{,}851 firings at Qwen3.5-0.8B L9 H4, the 87-atom population test holds at 89.8%, the closed form predicts measured effects at R^2=0.98, andMamba-2-370M substitutes at 88.1% over 2,500 firings. Sustained three-position installs at 3times lift midrank target-in-continuation from 33.3% to 100% undergreedy decoding, the first behavioral install at the matrix-recurrent write site.

View arXiv page View PDF Project page GitHub0 Add to collection

Get this paper in your agent:

hf papers read 2605\.12770

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### JackYoung27/writesae-ckpts Feature Extraction• Updatedabout 2 hours ago

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.12770 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.12770 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

WriteSAE: Sparse Autoencoders for Recurrent State

Paper page - WriteSAE: Sparse Autoencoders for Recurrent State

Abstract

Models citing this paper1

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Rational Sparse Autoencoder

Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models

Turn-Averaged SAEs for Feature Discovery and Long-Context Attribution

Are Single-Token Sparse Autoencoder Features Causally Necessary? Layer-Depth and SAE-Family Effects

Decompose Sparsely Where You Should, Absorb Densely Where You Should No

Submit Feedback

Similar Articles

Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models

Turn-Averaged SAEs for Feature Discovery and Long-Context Attribution

Are Single-Token Sparse Autoencoder Features Causally Necessary? Layer-Depth and SAE-Family Effects

Decompose Sparsely Where You Should, Absorb Densely Where You Should No