Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

Hugging Face Daily Papers 05/29/26, 12:00 AM Papers

prompt-injection defense llm-safety adaptive-detector benchmark pre-hoc-reasoning

Summary

Introduces SCOUT, a framework that dynamically allocates prompt-injection detectors per request by predicting reliability and latency, improving safety and efficiency. Also presents SCOUT-450, a benchmark for complex agent-facing injections, showing a 46% reduction in attack-success rate and 40% latency reduction over a fixed GPT-4o judge.

Prompt-injection detectors are heterogeneous: each is strong on a different slice of attacks, and none is always reliable. Yet existing systems still treat detection as a fixed single-detector pipeline, committing every request to one detector's blind spots. We reframe defense as detector allocation: given a heterogeneous pool, decide per request which detectors to run and whether to escalate to an LLM judge. Our framework SCOUT (Scalable and Controllable Outcome-prediction for Uncertainty-aware Triage) makes this decision dynamic by predicting each detector's per-sample reliability and latency from how it behaved on similar past inputs, and exposes a single safety-utility threshold to the operator (where utility bundles benign-pass rate and wall-clock). To evaluate this setting, we build SCOUT-450, a benchmark that captures the structurally complex, agent-facing injections that older prompt-injection sets under-represent. On SCOUT-450, a safety-oriented operating point reduces attack-success rate by 46% and total wall-clock by 40% relative to an always-on GPT-4o judge, at a 5.1-point benign-utility drop. SCOUT also transfers to three external benchmarks (BIPIA, IPI, and IHEval), improving the safety-utility frontier.

Original Article

View Cached Full Text

Cached at: 06/10/26, 12:08 AM

Paper page - Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

Source: https://huggingface.co/papers/2605.30837

Abstract

SCOUT framework dynamically allocates prompt-injection detection by predicting detector reliability and latency, improving safety and efficiency over fixed single-detector approaches.

Prompt-injection detectorsare heterogeneous: each is strong on a different slice of attacks, and none is always reliable. Yet existing systems still treat detection as a fixed single-detector pipeline, committing every request to one detector’s blind spots. We reframe defense asdetector allocation: given a heterogeneous pool, decide per request which detectors to run and whether to escalate to an LLM judge. Our frameworkSCOUT(Scalable and Controllable Outcome-prediction forUncertainty-aware Triage) makes this decision dynamic by predicting each detector’s per-sample reliability and latency from how it behaved on similar past inputs, and exposes a singlesafety-utility thresholdto the operator (where utility bundlesbenign-pass rateandwall-clock). To evaluate this setting, we buildSCOUT-450, a benchmark that captures the structurally complex, agent-facing injections that older prompt-injection sets under-represent. OnSCOUT-450, a safety-oriented operating point reducesattack-success rateby 46% and totalwall-clockby 40% relative to an always-onGPT-4o judge, at a 5.1-point benign-utility drop.SCOUTalso transfers to three external benchmarks (BIPIA,IPI, andIHEval), improving the safety-utility frontier.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2605\.30837

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.30837 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.30837 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.30837 in a Space README.md to link it from this page.

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

Paper page - Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper1

Similar Articles

Understanding prompt injections: a frontier security challenge

Most injection detectors score each prompt in isolation. I built one that tracks the geometric trajectory of the full session. Here is a concrete result.

Agent enforcement engine with auditing & solves prompt injection

Designing AI agents to resist prompt injection

trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser

Submit Feedback

Similar Articles

Understanding prompt injections: a frontier security challenge

Most injection detectors score each prompt in isolation. I built one that tracks the geometric trajectory of the full session. Here is a concrete result.

Agent enforcement engine with auditing & solves prompt injection

Designing AI agents to resist prompt injection

trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser