ICA Lens: Interpreting Language Models Without Training Another Dictionary

Hugging Face Daily Papers 06/10/26, 12:00 AM Papers

Summary

ICA Lens revives independent component analysis as an efficient method for interpreting language model representations, offering a faster alternative to sparse autoencoder training while maintaining competitive performance.

Finding interpretable directions in language-model representations is critical for understanding and controlling model behavior. Sparse autoencoders (SAEs) have become the standard tool for this purpose, but using them as the default first lens often requires training, storing, and evaluating large overcomplete dictionaries. This bottleneck limits rapid exploration and raises a fundamental question: how much interpretable structure is already visible from activation geometry before training another neural dictionary? Our intuition is simple: many interpretable directions are selective on tokens, and these directions should look less Gaussian than random directions. We therefore revisit independent component analysis (ICA), a classical method for finding non-Gaussian directions, as a compact lens for language-model interpretability. We find that ICA has been underestimated for LLM interpretability, because prior uses often relied on off-the-shelf ICA implementations that are brittle on LLM activations and lacked systematic tools for inspecting and evaluating the recovered directions. To bridge these gaps, we introduce ICALens, the first practical workflow for stable, efficient, and auditable ICA analysis of LLM representations. It combines an optimized GPU-parallel FastICA pipeline with LLM-specific stability recipes and better fitting diagnostics, enabling efficient and reliable layer-wise analysis. Across GPT-2 Small, Gemma 2 2B, and Qwen 3.5 2B Base, ICALens efficiently recovers compact, human-interpretable directions without per-layer gradient-based dictionary training. On SAEBench, ICA is competitive with public SAEs in sparse probing and outperforms them in targeted probe perturbation under small-to-medium budgets. These results suggest that ICA should not be viewed as a weak baseline, but as an efficient and complementary first lens for exploring language-model representations.

Original Article

View Cached Full Text

Cached at: 06/11/26, 01:41 PM

Paper page - ICA Lens: Interpreting Language Models Without Training Another Dictionary

Source: https://huggingface.co/papers/2606.11722

Abstract

Independent component analysis (ICA) is revived as an efficient method for discovering interpretable directions in language model representations, offering a faster alternative to sparse autoencoder training while maintaining competitive performance in probing tasks.

Finding interpretable directions inlanguage-model representationsis critical for understanding and controlling model behavior.Sparse autoencoders(SAEs) have become the standard tool for this purpose, but using them as the default first lens often requires training, storing, and evaluating large overcomplete dictionaries. This bottleneck limits rapid exploration and raises a fundamental question: how much interpretable structure is already visible fromactivation geometrybefore training another neural dictionary? Our intuition is simple: many interpretable directions are selective on tokens, and these directions should look less Gaussian than random directions. We therefore revisitindependent component analysis(ICA), a classical method for finding non-Gaussian directions, as a compact lens for language-model interpretability. We find thatICAhas been underestimated forLLM interpretability, because prior uses often relied on off-the-shelfICAimplementations that are brittle on LLM activations and lacked systematic tools for inspecting and evaluating the recovered directions. To bridge these gaps, we introduceICALens, the first practical workflow for stable, efficient, and auditableICAanalysis of LLM representations. It combines an optimized GPU-parallelFastICApipeline with LLM-specific stability recipes and better fitting diagnostics, enabling efficient and reliable layer-wise analysis. Across GPT-2 Small, Gemma 2 2B, and Qwen 3.5 2B Base,ICALens efficiently recovers compact, human-interpretable directions without per-layer gradient-based dictionary training. OnSAEBench,ICAis competitive with public SAEs insparse probingand outperforms them intargeted probe perturbationunder small-to-medium budgets. These results suggest thatICAshould not be viewed as a weak baseline, but as an efficient and complementary first lens for exploringlanguage-model representations.

View arXiv page View PDF Project page GitHub20 Add to collection

Get this paper in your agent:

hf papers read 2606\.11722

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.11722 in a model README.md to link it from this page.

Datasets citing this paper1

#### sida/ica-lens-paper Updatedabout 12 hours ago • 37

ICA Lens: Interpreting Language Models Without Training Another Dictionary

Paper page - ICA Lens: Interpreting Language Models Without Training Another Dictionary

Abstract

Models citing this paper0

Datasets citing this paper1

Spaces citing this paper1

Collections including this paper1

Similar Articles

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

Large language models reorganize representational geometry during in-context learning

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

KODA: Contrastive Representation Comparison and Alignment for Vision-Language Foundation Models

ModelLens: Finding the Best for Your Task from Myriads of Models

Submit Feedback

Similar Articles

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

Large language models reorganize representational geometry during in-context learning

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

KODA: Contrastive Representation Comparison and Alignment for Vision-Language Foundation Models

ModelLens: Finding the Best for Your Task from Myriads of Models