impossibility-theorem

#impossibility-theorem

On the Inseparability of Instructions and Data in Shared-Embedding Sequence Models

arXiv cs.AI ↗ · yesterday Cached

This paper formalizes the impossibility of perfect prompt-injection prevention in shared-embedding sequence models, proving that no in-pipeline mechanism can guarantee Semantic-Faithful Control due to inseparable representations of instructions and data, analogous to code-data confusion in Von Neumann architectures.

0 favorites 0 likes

#impossibility-theorem

The Impossibility of Eliciting Latent Knowledge

arXiv cs.AI ↗ · 2026-06-11 Cached

This paper formally defines the problem of eliciting latent knowledge (ELK) from AI systems using Causal Influence Diagrams, and proves an impossibility theorem: no feedback-based training strategy that depends only on agent behavior can guarantee an honest agent, even with perfect training feedback.

0 favorites 0 likes

impossibility-theorem

On the Inseparability of Instructions and Data in Shared-Embedding Sequence Models

The Impossibility of Eliciting Latent Knowledge

Submit Feedback