Tag
This paper introduces Latent Visualization by Optimization (LVO), a mechanistic interpretability technique that uses sparse autoencoders to visualize monosemantic features in diffusion models like Stable Diffusion 1.5.
OpenAI introduces Activation Atlases, a technique for visualizing and understanding the internal representations of neural networks, enabling humans to discover spurious correlations and unexpected behaviors such as fooling image classifiers by adding noodles to images.