Deep Dreams Are Made of This: Visualizing Monosemantic Features in Diffusion Models
Summary
This paper introduces Latent Visualization by Optimization (LVO), a mechanistic interpretability technique that uses sparse autoencoders to visualize monosemantic features in diffusion models like Stable Diffusion 1.5.
View Cached Full Text
Cached at: 05/12/26, 07:07 AM
# Deep Dreams Are Made of This: Visualizing Monosemantic Features in Diffusion Models Source: [https://arxiv.org/abs/2605.08218](https://arxiv.org/abs/2605.08218) [View PDF](https://arxiv.org/pdf/2605.08218) > Abstract:This paper proposes latent visualization by optimization \(LVO\), a mechanistic interpretability technique that extends feature visualization by optimization \- originally developed for convolutional neural networks \- to latent diffusion models\. LVO employs sparse autoencoders \(SAEs\) to disentangle polysemantic layer representations into monosemantic features\. Key contributions include latent\-space optimization, time\-step activity analysis, schedule\-matched noise injection, prior initialization through feature steering, and suitable regularization strategies\. We demonstrate the method on Stable Diffusion 1\.5 fine\-tuned on the Style50 dataset, showing that SAE features produce clear visualizations of recognizable concepts \- including diagonal compositions, human figures, roses, cables, and waterfall foam \- that correlate with dataset examples, while the baseline without disentanglement produces less coherent results\. We further show that regularization techniques from pixel\-space feature visualization transfer to the latent domain, though they require different configurations for the raw\-layer and SAE variants\. Compared to dataset examples and steering, LVO provides complementary insights by directly revealing what activates a feature rather than its downstream effects\. ## Submission history From: Adam Szokalski \[[view email](https://arxiv.org/show-email/9097c59b/2605.08218)\] **\[v1\]**Wed, 6 May 2026 13:21:24 UTC \(26,164 KB\)
Similar Articles
TextLDM: Language Modeling with Continuous Latent Diffusion
This paper introduces TextLDM, a method that adapts visual latent diffusion transformers for language modeling by mapping discrete tokens to continuous latents. It demonstrates that this approach, enhanced by representation alignment, matches GPT-2 performance and unifies visual and text generation architectures.
Diffusion Model as a Generalist Segmentation Learner
This paper introduces DiGSeg, a framework that repurposes pretrained diffusion models for state-of-the-art semantic and open-vocabulary segmentation by leveraging latent space conditioning and text-guided alignment.
Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models
This paper introduces a novel adaptive scheduler for steering discrete diffusion language models using sparse autoencoders, demonstrating that targeting interventions based on when specific attributes commit improves control quality and strength over uniform methods.
Backbone-Equated Diffusion OOD via Sparse Internal Snapshots
This paper introduces a protocol for fair comparison of diffusion-based OOD detectors and proposes Canonical Feature Snapshots (CFS), which leverage sparse internal activations for efficient detection.
What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion
This article introduces Prior-Aligned Autoencoders (PAE), a new method for creating diffusion-friendly latent manifolds that achieves state-of-the-art image generation quality while enabling 13x faster training convergence.