ICA Lens: Interpreting Language Models Without Training Another Dictionary
Summary
ICA Lens revives independent component analysis as an efficient method for interpreting language model representations, offering a faster alternative to sparse autoencoder training while maintaining competitive performance.
View Cached Full Text
Cached at: 06/11/26, 01:41 PM
Paper page - ICA Lens: Interpreting Language Models Without Training Another Dictionary
Source: https://huggingface.co/papers/2606.11722
Abstract
Independent component analysis (ICA) is revived as an efficient method for discovering interpretable directions in language model representations, offering a faster alternative to sparse autoencoder training while maintaining competitive performance in probing tasks.
Finding interpretable directions inlanguage-model representationsis critical for understanding and controlling model behavior.Sparse autoencoders(SAEs) have become the standard tool for this purpose, but using them as the default first lens often requires training, storing, and evaluating large overcomplete dictionaries. This bottleneck limits rapid exploration and raises a fundamental question: how much interpretable structure is already visible fromactivation geometrybefore training another neural dictionary? Our intuition is simple: many interpretable directions are selective on tokens, and these directions should look less Gaussian than random directions. We therefore revisitindependent component analysis(ICA), a classical method for finding non-Gaussian directions, as a compact lens for language-model interpretability. We find thatICAhas been underestimated forLLM interpretability, because prior uses often relied on off-the-shelfICAimplementations that are brittle on LLM activations and lacked systematic tools for inspecting and evaluating the recovered directions. To bridge these gaps, we introduceICALens, the first practical workflow for stable, efficient, and auditableICAanalysis of LLM representations. It combines an optimized GPU-parallelFastICApipeline with LLM-specific stability recipes and better fitting diagnostics, enabling efficient and reliable layer-wise analysis. Across GPT-2 Small, Gemma 2 2B, and Qwen 3.5 2B Base,ICALens efficiently recovers compact, human-interpretable directions without per-layer gradient-based dictionary training. OnSAEBench,ICAis competitive with public SAEs insparse probingand outperforms them intargeted probe perturbationunder small-to-medium budgets. These results suggest thatICAshould not be viewed as a weak baseline, but as an efficient and complementary first lens for exploringlanguage-model representations.
View arXiv pageView PDFProject pageGitHub20Add to collection
Get this paper in your agent:
hf papers read 2606\.11722
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.11722 in a model README.md to link it from this page.
Datasets citing this paper1
#### sida/ica-lens-paper Updatedabout 12 hours ago • 37
Spaces citing this paper1
Collections including this paper1
Similar Articles
Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects
Query Lens extends Logit Lens to interpret sparse autoencoder features by jointly considering encoder-side key features and decoder-side value features, and accounting for indirect effects from downstream modules. The paper also introduces the Subspace Channel Hypothesis, suggesting downstream modules read features through layer-specific subspaces.
Large language models reorganize representational geometry during in-context learning
This paper investigates how large language models reorganize representational geometry during in-context learning, showing that ICL performance correlates with the geometric structure of tasks and that successful ICL involves increasing separability of representations.
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models
Lens is a compact 3.8B-parameter text-to-image model from Microsoft that achieves competitive performance with larger models while requiring significantly less training compute, using dense captions, multi-resolution batching, and efficient architecture.
KODA: Contrastive Representation Comparison and Alignment for Vision-Language Foundation Models
This paper introduces KODA (Kernel Optimization for Discrepancy Analysis), a kernel-based framework for comparing and aligning vision-language model representations by identifying sample subsets that are clustered differently across models like CLIP, SigLIP, and BLIP. The method uses contrastive embedding clustering and randomized low-dimensional approximations to scale to large datasets while providing interpretable structural differences between representations.
ModelLens: Finding the Best for Your Task from Myriads of Models
ModelLens is a unified framework that recommends AI models for unseen datasets by learning from public leaderboard data, eliminating the need for costly direct evaluations. It constructs a performance-aware latent space to rank candidates across diverse tasks, outperforming existing baselines on large-scale benchmarks.