features

Tag

Cards List
#features

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

arXiv cs.AI · 2026-05-29 Cached

This paper demonstrates that sparse autoencoders can extract interpretable features from Claude 3 Sonnet, a production-scale language model, addressing scalability concerns for dictionary learning. The features are multilingual, multimodal, and include safety-relevant concepts like deception and sycophancy, with causal influence on model outputs.

0 favorites 0 likes
#features

@cambridgemike: What other features do you want for group chats?

X AI KOLs Following · 2026-05-24 Cached

A user asks for feature suggestions for group chats, referencing XChat's upcoming admin setting that restricts messaging to admins only.

0 favorites 0 likes
#features

andreasjansson/clip-features

Replicate Explore · 2026-05-08 Cached

A model on Replicate that outputs CLIP ViT-L/14 features for text and images, allowing similarity computation between inputs.

0 favorites 0 likes
← Back to home

Submit Feedback