Tag
The author built a custom llama.cpp server and Mikupad UI to enable local inference and activation steering with Anthropic's open-weight Natural Language Autoencoders. A LoRA version is in development to reduce memory requirements.
Anthropic and Neuronpedia released research and tools on Natural Language Autoencoders (NLA), enabling users to view the internal 'thoughts' of Gemma 3 during token generation. The release includes model weights for the Auto Verbalizer and Activation Reconstructor, hosted on Hugging Face and Neuronpedia.
Anthropic introduces Natural Language Autoencoders (NLAs), a method to translate internal AI activations into human-readable text, enabling better understanding of model thoughts and improving safety by revealing hidden reasoning processes.
Anthropic and Neuronpedia have partnered to release Natural Language Autoencoders (NLAs) on open models, allowing researchers to gain hands-on experience with this interpretability tool.