Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.
Summary
Arc Sentry is a new pre-generation prompt-injection detector that reads a model’s internal residual stream, achieving 92% detection with 0% false positives versus LLM Guard’s 70%/3.3% on a 130-prompt benchmark.
Similar Articles
LLM Guard scored 0/8 on a USENIX 2025 multi-turn jailbreak. Here’s what caught it instead.
Arc Sentry detects multi-turn jailbreaks like Crescendo by reading model internal state rather than text output, catching attacks that text-based monitors miss entirely.
Gate AI: LLM Security Benchmark Evaluation Methodology and Results
This paper presents an evaluation methodology for LLM security detectors that addresses systematic weaknesses like per-dataset threshold tuning and undisclosed operating points. The framework uses cross-validation across 16 benchmarks, selects a single global operating point, and includes multiple diagnostics for generalization.
Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense
Introduces SCOUT, a framework that dynamically allocates prompt-injection detectors per request by predicting reliability and latency, improving safety and efficiency. Also presents SCOUT-450, a benchmark for complex agent-facing injections, showing a 46% reduction in attack-success rate and 40% latency reduction over a fixed GPT-4o judge.
Most injection detectors score each prompt in isolation. I built one that tracks the geometric trajectory of the full session. Here is a concrete result.
A developer built Arc Gate, a monitoring proxy for LLMs that uses Fisher information manifold geometry to detect session-level prompt injection attacks, identifying Crescendo-style gradual manipulation by tracking t-values against a phase transition threshold t* = 1.2247 rather than per-turn phrase detection.
We built a public red team environment for our AI agent security proxy — submit attacks and get a full security trace back
Arc Gate is a runtime governance layer for LLM agents that enforces instruction-authority boundaries. The project has launched a public red team environment where users can submit attacks and receive full security traces, with a benchmark showing 100% unsafe action prevention.