I built a tool that shows you what GPT-2 is "thinking" in real-time as it generates 3D graph of concept activations per token [R]

Reddit r/MachineLearning 05/19/26, 06:46 PM Tools

mechanistic-interpretability gpt-2 sparse-autoencoder visualization real-time 3d-graph open-source

Summary

A developer built AXON, a tool that visualizes GPT-2's internal concept activations as a live 3D force graph using Sparse Autoencoders, allowing users to see interpretable features firing before token generation.

Been going down a mechanistic interpretability rabbit hole for the past few weeks and ended up building this thing called AXON. The idea: every time GPT-2 generates a token, its residual stream gets passed through a Sparse Autoencoder (Joseph Bloom's pretrained SAE). The SAE decomposes it into human-interpretable feature: hings like "European geography", "capital cities", "French language" and streams those to the browser over WebSocket, where they show up as a live 3D force graph. Nodes = SAE features. Edges = features that fired together on the same token. Node brightness = activation strength. The whole graph evolves token by token. What surprised me most: type "The capital of France is" and you can literally watch geography features, proper noun features, and completion-pattern features light up before the word "Paris" even gets generated. It's not what the model outputs that's interesting it's what's happening right before it decides. Stack: TransformerLens + SAELens on the backend, FastAPI WebSocket for streaming, Three.js + 3d-force-graph on the frontend. Runs on CPU (\~800ms/token) or GPU (\~35ms on a 4050). Labels come from Neuronpedia's API and get cached locally. You can also swap in other models — GPT-2 medium/large/xl, Pythia variants, Gemma-2-2B — as long as there's a pretrained SAE for it in SAELens. GitHub: https://github.com/09Catho/axon Would love feedback and stars especially from anyone who's worked with SAEs before curious whether the co-activation edges are actually meaningful or just noise at this layer.

Original Article

I built a tool that shows you what GPT-2 is "thinking" in real-time as it generates 3D graph of concept activations per token [R]

Similar Articles

@AlphaSignalAI: This free interactive explainer just exposed how GPT actually works. Most people treat Transformers like magic. You typ…

Extracting Concepts from GPT-4

Transformer Explainer: Interactive Learning of Text-Generative Models

@DamiDefi: A developer just mapped every AI concept powering Claude, ChatGPT, and every agent stack you are building on. 20 concep…

You can now read Gemma 3's mind

Submit Feedback

Similar Articles

@AlphaSignalAI: This free interactive explainer just exposed how GPT actually works. Most people treat Transformers like magic. You typ…

Extracting Concepts from GPT-4

Transformer Explainer: Interactive Learning of Text-Generative Models

@DamiDefi: A developer just mapped every AI concept powering Claude, ChatGPT, and every agent stack you are building on. 20 concep…

You can now read Gemma 3's mind