Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense
Summary
Introduces SCOUT, a framework that dynamically allocates prompt-injection detectors per request by predicting reliability and latency, improving safety and efficiency. Also presents SCOUT-450, a benchmark for complex agent-facing injections, showing a 46% reduction in attack-success rate and 40% latency reduction over a fixed GPT-4o judge.
View Cached Full Text
Cached at: 06/10/26, 12:08 AM
Paper page - Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense
Source: https://huggingface.co/papers/2605.30837
Abstract
SCOUT framework dynamically allocates prompt-injection detection by predicting detector reliability and latency, improving safety and efficiency over fixed single-detector approaches.
Prompt-injection detectorsare heterogeneous: each is strong on a different slice of attacks, and none is always reliable. Yet existing systems still treat detection as a fixed single-detector pipeline, committing every request to one detector’s blind spots. We reframe defense asdetector allocation: given a heterogeneous pool, decide per request which detectors to run and whether to escalate to an LLM judge. Our frameworkSCOUT(Scalable and Controllable Outcome-prediction forUncertainty-aware Triage) makes this decision dynamic by predicting each detector’s per-sample reliability and latency from how it behaved on similar past inputs, and exposes a singlesafety-utility thresholdto the operator (where utility bundlesbenign-pass rateandwall-clock). To evaluate this setting, we buildSCOUT-450, a benchmark that captures the structurally complex, agent-facing injections that older prompt-injection sets under-represent. OnSCOUT-450, a safety-oriented operating point reducesattack-success rateby 46% and totalwall-clockby 40% relative to an always-onGPT-4o judge, at a 5.1-point benign-utility drop.SCOUTalso transfers to three external benchmarks (BIPIA,IPI, andIHEval), improving the safety-utility frontier.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.30837
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.30837 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.30837 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.30837 in a Space README.md to link it from this page.
Collections including this paper1
Similar Articles
Understanding prompt injections: a frontier security challenge
OpenAI publishes guidance on prompt injection attacks, a social engineering vulnerability where malicious instructions hidden in web content or documents can trick AI models into unintended actions. The company outlines its multi-layered defense strategy including instruction hierarchy research, automated red-teaming, and AI-powered monitoring systems.
Most injection detectors score each prompt in isolation. I built one that tracks the geometric trajectory of the full session. Here is a concrete result.
A developer built Arc Gate, a monitoring proxy for LLMs that uses Fisher information manifold geometry to detect session-level prompt injection attacks, identifying Crescendo-style gradual manipulation by tracking t-values against a phase transition threshold t* = 1.2247 rather than per-turn phrase detection.
Agent enforcement engine with auditing & solves prompt injection
A tool built with pure math and determinism to solve indirect prompt injection and agent drifting, providing a pure audit trace chain. The creator is seeking pilot interest.
Designing AI agents to resist prompt injection
OpenAI publishes guidance on designing AI agents resistant to prompt injection attacks, arguing that modern attacks increasingly use social engineering tactics rather than simple string injections, and advocating for system-level defenses that constrain impact rather than relying solely on input filtering.
trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser
Trained a prompt injection classifier using ml-intern and DeepSeek V4 Flash, achieving 99% F1 with DistilBERT, optimized to ONNX int8 (~65MB) and deployable in the browser via Transformers.js v3.