Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.
Summary
Arc Sentry is a new pre-generation prompt-injection detector that reads a model’s internal residual stream, achieving 92% detection with 0% false positives versus LLM Guard’s 70%/3.3% on a 130-prompt benchmark.
Similar Articles
Most injection detectors score each prompt in isolation. I built one that tracks the geometric trajectory of the full session. Here is a concrete result.
A developer built Arc Gate, a monitoring proxy for LLMs that uses Fisher information manifold geometry to detect session-level prompt injection attacks, identifying Crescendo-style gradual manipulation by tracking t-values against a phase transition threshold t* = 1.2247 rather than per-turn phrase detection.
We built a public red team environment for our AI agent security proxy — submit attacks and get a full security trace back
Arc Gate is a runtime governance layer for LLM agents that enforces instruction-authority boundaries. The project has launched a public red team environment where users can submit attacks and receive full security traces, with a benchmark showing 100% unsafe action prevention.
PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors
# Paper page - PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors Source: [https://huggingface.co/papers/2605.06455](https://huggingface.co/papers/2605.06455) ## Abstract PrefixGuard enables effective online monitoring of LLM agents through trace analysis and prefix\-based risk scoring, demonstrating strong performance across multiple benchmark tasks while providing diagnostic insights for alert reliability\. Large language model \(LLM\) agents now execute long, tool\-using ta
Built a tool that stops AI agents from being hijacked by malicious content in webpages and emails
Arc Gate is a proxy that protects AI agents from prompt injection attacks by treating web and email content as untrusted, requiring no code changes from developers.
LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning
LLMSniffer is a detection framework that fine-tunes GraphCodeBERT with supervised contrastive learning to distinguish AI-generated code from human-written code, achieving 78% accuracy on GPTSniffer and 94.65% on Whodunit benchmarks. The approach addresses critical challenges in academic integrity and code quality assurance by combining code-structure-aware embeddings with contrastive learning and comment removal preprocessing.