impossibility-theorem

Tag

Cards List
#impossibility-theorem

On the Inseparability of Instructions and Data in Shared-Embedding Sequence Models

arXiv cs.AI · yesterday Cached

This paper formalizes the impossibility of perfect prompt-injection prevention in shared-embedding sequence models, proving that no in-pipeline mechanism can guarantee Semantic-Faithful Control due to inseparable representations of instructions and data, analogous to code-data confusion in Von Neumann architectures.

0 favorites 0 likes
#impossibility-theorem

The Impossibility of Eliciting Latent Knowledge

arXiv cs.AI · 2026-06-11 Cached

This paper formally defines the problem of eliciting latent knowledge (ELK) from AI systems using Causal Influence Diagrams, and proves an impossibility theorem: no feedback-based training strategy that depends only on agent behavior can guarantee an honest agent, even with perfect training feedback.

0 favorites 0 likes
← Back to home

Submit Feedback