WriteSAE: Sparse Autoencoders for Recurrent State

Hugging Face Daily Papers Papers

Summary

WriteSAE introduces the first sparse autoencoder that decomposes matrix cache writes in state-space and hybrid recurrent language models, enabling superior token-level interventions compared to existing methods.

We introduce WriteSAE, the first sparse autoencoder that decomposes and edits the matrix cache write of state-space and hybrid recurrent language models, where residual SAEs cannot reach. Existing SAEs read residual streams, but Gated DeltaNet, Mamba-2, and RWKV-7 write to a d_k times d_v cache through rank-1 updates k_t v_t^top that no vector atom can replace. WriteSAE factors each decoder atom into the native write shape, exposes a closed form for the per-token logit shift, and trains under matched Frobenius norm so atoms swap one cache slot at a time. Atom substitution beats matched-norm ablation on 92.4% of n=4{,}851 firings at Qwen3.5-0.8B L9 H4, the 87-atom population test holds at 89.8%, the closed form predicts measured effects at R^2=0.98, and Mamba-2-370M substitutes at 88.1% over 2,500 firings. Sustained three-position installs at 3times lift midrank target-in-continuation from 33.3% to 100% under greedy decoding, the first behavioral install at the matrix-recurrent write site.
Original Article
View Cached Full Text

Cached at: 05/14/26, 04:17 AM

Paper page - WriteSAE: Sparse Autoencoders for Recurrent State

Source: https://huggingface.co/papers/2605.12770

Abstract

WriteSAE enables sparse autoencoder decomposition and editing of matrix cache writes in state-space and hybrid recurrent language models, achieving superior performance in token-level interventions compared to existing methods.

We introduce WriteSAE, the firstsparse autoencoderthat decomposes and edits thematrix cache writeof state-space andhybrid recurrent language models, whereresidual SAEscannot reach. Existing SAEs read residual streams, butGated DeltaNet,Mamba-2, andRWKV-7write to a d_k times d_v cache throughrank-1 updatesk_t v_t^top that no vector atom can replace. WriteSAE factors each decoder atom into the native write shape, exposes a closed form for the per-token logit shift, and trains under matchedFrobenius normso atoms swap one cache slot at a time.Atom substitutionbeats matched-norm ablation on 92.4% of n=4{,}851 firings at Qwen3.5-0.8B L9 H4, the 87-atom population test holds at 89.8%, the closed form predicts measured effects at R^2=0.98, andMamba-2-370M substitutes at 88.1% over 2,500 firings. Sustained three-position installs at 3times lift midrank target-in-continuation from 33.3% to 100% undergreedy decoding, the first behavioral install at the matrix-recurrent write site.

View arXiv pageView PDFProject pageGitHub0Add to collection

Get this paper in your agent:

hf papers read 2605\.12770

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### JackYoung27/writesae-ckpts Feature Extraction• Updatedabout 2 hours ago

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.12770 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.12770 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Can SAEs Capture Neural Geometry? (6 minute read)

TLDR AI

This article explores how sparse autoencoders (SAEs) can capture curved neural geometry, revealing three distinct ways SAE features represent manifolds, and presents an unsupervised pipeline to uncover geometric structure in neural representations.

Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs

Hugging Face Daily Papers

# Paper page - Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs Source: [https://huggingface.co/papers/2605.07447](https://huggingface.co/papers/2605.07447) ## Abstract SAEgis detects adversarial attacks on vision\-language models using sparse autoencoders trained for reconstruction, achieving strong performance across domains without additional training\. [Vision\-language models](https://huggingface.co/papers?q=Vision-language%20models)\(VLMs\) have advan