You can now read Gemma 3's mind
Summary
Anthropic and Neuronpedia released research and tools on Natural Language Autoencoders (NLA), enabling users to view the internal 'thoughts' of Gemma 3 during token generation. The release includes model weights for the Auto Verbalizer and Activation Reconstructor, hosted on Hugging Face and Neuronpedia.
Similar Articles
Introducing Gemma 3n: The developer guide
Google DeepMind announces the full release of Gemma 3n, a mobile-first multimodal AI model optimized for on-device efficiency with MatFormer architecture. The release includes E2B and E4B variants designed for low memory usage while delivering strong performance in reasoning, coding, and multilingual tasks.
Natural Language Autoencoders: Turning Claude's Thoughts into Text
Anthropic introduces Natural Language Autoencoders (NLAs), a method to translate internal AI activations into human-readable text, enabling better understanding of model thoughts and improving safety by revealing hidden reasoning processes.
Introducing Gemma 4 12B: a unified, encoder-free multimodal model
Google DeepMind announces Gemma 4 12B, a novel encoder-free multimodal AI model that integrates vision and audio directly into the LLM backbone, delivering advanced reasoning and agentic capabilities on laptops with 16GB of RAM, released under Apache 2.0 license.
Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior
DeepMind releases Gemma Scope 2, an open suite of interpretability tools for the Gemma 3 model family, aiming to help the AI safety community understand and debug complex language model behaviors like hallucinations and jailbreaks.
@AnthropicAI: To support other researchers getting hands-on experience with NLAs, we’ve partnered with Neuronpedia to release NLAs on…
Anthropic and Neuronpedia have partnered to release Natural Language Autoencoders (NLAs) on open models, allowing researchers to gain hands-on experience with this interpretability tool.