Tag
This paper presents a methodology for delineating concepts and training linear probes to detect them in LLM embeddings, using four example concepts across three models. The work aims to enable scalable monitoring of LLM internal representations.