feature-basis

Tag

Cards List
#feature-basis

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

Hugging Face Daily Papers · 3d ago Cached

Proposes the Bag of Dims framework showing that the standard basis of transformer hidden states provides a training-free, architecture-general feature representation where dimensions encode semantic content via sign patterns; validated across language, vision, and audio models, achieving high accuracy with no learned rotations.

0 favorites 0 likes
← Back to home

Submit Feedback