Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.

Reddit r/artificial 04/23/26, 04:02 AM Tools

Summary

Arc Sentry is a new pre-generation prompt-injection detector that reads a model’s internal residual stream, achieving 92% detection with 0% false positives versus LLM Guard’s 70%/3.3% on a 130-prompt benchmark.

I built Arc Sentry, a pre-generation prompt injection detector for open-weight LLMs. Instead of scanning text for patterns after the fact, it reads the model’s internal residual stream before generate() is called and blocks requests that destabilize the model’s information geometry. Head to head benchmark on a 130-prompt SaaS deployment dataset: Arc Sentry: 92% detection, 0% false positives LLM Guard: 70% detection, 3.3% false positives The difference is architectural. LLM Guard classifies input text. Arc Sentry measures whether the model itself is being pushed into an unstable regime. Those are different problems and the geometry catches attacks that text classifiers miss. It also catches Crescendo multi-turn manipulation attacks that look innocent one turn at a time. LLM Guard caught 0 of 8 in that test. Install: pip install arc-sentry GitHub: https://github.com/9hannahnine-jpg/arc-sentry If you are self-hosting Mistral, Llama, or Qwen and want to try it, let me know.

Original Article

Similar Articles

Most injection detectors score each prompt in isolation. I built one that tracks the geometric trajectory of the full session. Here is a concrete result.

Reddit r/artificial

A developer built Arc Gate, a monitoring proxy for LLMs that uses Fisher information manifold geometry to detect session-level prompt injection attacks, identifying Crescendo-style gradual manipulation by tracking t-values against a phase transition threshold t* = 1.2247 rather than per-turn phrase detection.

We built a public red team environment for our AI agent security proxy — submit attacks and get a full security trace back

Reddit r/artificial

Arc Gate is a runtime governance layer for LLM agents that enforces instruction-authority boundaries. The project has launched a public red team environment where users can submit attacks and receive full security traces, with a benchmark showing 100% unsafe action prevention.

PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors

Hugging Face Daily Papers

# Paper page - PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors Source: [https://huggingface.co/papers/2605.06455](https://huggingface.co/papers/2605.06455) ## Abstract PrefixGuard enables effective online monitoring of LLM agents through trace analysis and prefix\-based risk scoring, demonstrating strong performance across multiple benchmark tasks while providing diagnostic insights for alert reliability\. Large language model \(LLM\) agents now execute long, tool\-using ta

Built a tool that stops AI agents from being hijacked by malicious content in webpages and emails

Reddit r/artificial

Arc Gate is a proxy that protects AI agents from prompt injection attacks by treating web and email content as untrusted, requiring no code changes from developers.

LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning

arXiv cs.CL

LLMSniffer is a detection framework that fine-tunes GraphCodeBERT with supervised contrastive learning to distinguish AI-generated code from human-written code, achieving 78% accuracy on GPTSniffer and 94.65% on Whodunit benchmarks. The approach addresses critical challenges in academic integrity and code quality assurance by combining code-structure-aware embeddings with contrastive learning and comment removal preprocessing.

Similar Articles

Most injection detectors score each prompt in isolation. I built one that tracks the geometric trajectory of the full session. Here is a concrete result.

We built a public red team environment for our AI agent security proxy — submit attacks and get a full security trace back

PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors

Built a tool that stops AI agents from being hijacked by malicious content in webpages and emails

LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning

Submit Feedback