WriteSAE: Sparse Autoencoders for Recurrent State
Summary
WriteSAE introduces the first sparse autoencoder that decomposes matrix cache writes in state-space and hybrid recurrent language models, enabling superior token-level interventions compared to existing methods.
View Cached Full Text
Cached at: 05/14/26, 04:17 AM
Paper page - WriteSAE: Sparse Autoencoders for Recurrent State
Source: https://huggingface.co/papers/2605.12770
Abstract
WriteSAE enables sparse autoencoder decomposition and editing of matrix cache writes in state-space and hybrid recurrent language models, achieving superior performance in token-level interventions compared to existing methods.
We introduce WriteSAE, the firstsparse autoencoderthat decomposes and edits thematrix cache writeof state-space andhybrid recurrent language models, whereresidual SAEscannot reach. Existing SAEs read residual streams, butGated DeltaNet,Mamba-2, andRWKV-7write to a d_k times d_v cache throughrank-1 updatesk_t v_t^top that no vector atom can replace. WriteSAE factors each decoder atom into the native write shape, exposes a closed form for the per-token logit shift, and trains under matchedFrobenius normso atoms swap one cache slot at a time.Atom substitutionbeats matched-norm ablation on 92.4% of n=4{,}851 firings at Qwen3.5-0.8B L9 H4, the 87-atom population test holds at 89.8%, the closed form predicts measured effects at R^2=0.98, andMamba-2-370M substitutes at 88.1% over 2,500 firings. Sustained three-position installs at 3times lift midrank target-in-continuation from 33.3% to 100% undergreedy decoding, the first behavioral install at the matrix-recurrent write site.
View arXiv pageView PDFProject pageGitHub0Add to collection
Get this paper in your agent:
hf papers read 2605\.12770
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper1
#### JackYoung27/writesae-ckpts Feature Extraction• Updatedabout 2 hours ago
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.12770 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.12770 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models
SAE-FT introduces a novel fine-tuning method for CLIP models that uses sparse autoencoder constraints to regularize visual representations, improving robustness against distribution shifts while maintaining performance and enabling interpretability.
Feature Starvation as Geometric Instability in Sparse Autoencoders
This paper identifies feature starvation in sparse autoencoders as a geometric instability and proposes adaptive elastic net SAEs (AEN-SAEs) to mitigate it without heuristics.
Can SAEs Capture Neural Geometry? (6 minute read)
This article explores how sparse autoencoders (SAEs) can capture curved neural geometry, revealing three distinct ways SAE features represent manifolds, and presents an unsupervised pipeline to uncover geometric structure in neural representations.
Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs
# Paper page - Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs Source: [https://huggingface.co/papers/2605.07447](https://huggingface.co/papers/2605.07447) ## Abstract SAEgis detects adversarial attacks on vision\-language models using sparse autoencoders trained for reconstruction, achieving strong performance across domains without additional training\. [Vision\-language models](https://huggingface.co/papers?q=Vision-language%20models)\(VLMs\) have advan
Multilingual Steering by Design: Multilingual Sparse Autoencoders and Principled Layer Selection
This paper introduces a principled approach to multilingual language steering using sparse autoencoders (SAEs) trained on multilingual data and a novel layer selection rule based on the intersection of multilingual alignment and language separability, evaluated on LLaMA-3.1-8B and Gemma-2-9B for machine translation and cross-lingual summarization.