Tag
This paper proves that activation steering in LLMs produces internal states that cannot be replicated by any textual prompt, establishing a formal separation between white-box steerability and black-box prompting.