A system-level approach to prompt injection: separating instruction and data channels in LLM agents [P]

Reddit r/MachineLearning 07/01/26, 09:34 AM Papers

prompt-injection llm-agents system-level safety middleware token-based-authorization

Summary

This paper proposes Sentinel Gateway, a middleware layer that enforces strict separation between trusted instruction channels and untrusted data channels to mitigate prompt injection in LLM agents, using signed runtime authorization tokens and offering audit logging capabilities.

Prompt injection has emerged as one of the most persistent failure modes in tool-using LLM systems, particularly in agentic workflows where models interact with external data sources. Most mitigation strategies focus on input filtering or model-side alignment, but these approaches struggle because the core issue is structural: Approach I explored a system-level mitigation strategy by introducing a middleware layer (Sentinel Gateway) that enforces a strict separation between: Instruction channel: trusted, runtime-issued commands Data channel: untrusted external inputs (web, files, APIs) Instead of attempting to classify malicious inputs, the system ensures that: All agent actions require a signed, scoped runtime authorization token, effectively decoupling observation from execution. Implementation FastAPI middleware layer for agent tool calls Token-based authorization for execution requests Streamlit interface for inspection and debugging Audit logging of agent decisions and tool usage Supports multi-agent integration patterns (e.g., Claude-based sessions) Local or Postgres-backed persistence layer Repo https://github.com/cmtopbas/Sentinel-Gateway Discussion question I’m interested in feedback on: whether instruction/data separation is a meaningful abstraction for agent safety failure modes in token-based execution gating how this compares conceptually to other agent safety or sandboxing approaches

Original Article

A system-level approach to prompt injection: separating instruction and data channels in LLM agents [P]

Similar Articles

Prompt injection is still breaking agent systems I built a gateway that enforces instruction/data separation at runtime

I built a gateway to make prompt injection structurally impossible in agent workflows (design approach, not a model fix)

Understanding prompt injections: a frontier security challenge

How are you testing local coding-agent work gates against prompt injection?

How are you all handling prompt injection for agents that read external content?

Submit Feedback

Similar Articles

Prompt injection is still breaking agent systems I built a gateway that enforces instruction/data separation at runtime

I built a gateway to make prompt injection structurally impossible in agent workflows (design approach, not a model fix)

Understanding prompt injections: a frontier security challenge

How are you testing local coding-agent work gates against prompt injection?

How are you all handling prompt injection for agents that read external content?