surjectivity

#surjectivity

Steered LLM Activations are Non-Surjective

Hugging Face Daily Papers ↗ · 2026-05-07 Cached

This paper proves that activation steering in LLMs produces internal states that cannot be replicated by any textual prompt, establishing a formal separation between white-box steerability and black-box prompting.

0 favorites 0 likes

surjectivity

Steered LLM Activations are Non-Surjective

Submit Feedback