Tag
This paper provides the first empirical evidence for feature-specific error correction in large language models, showing that residual-stream activations are robust to small perturbations but less robust along candidate feature directions, supporting the theory of computation in superposition.