Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.

Reddit r/artificial Tools

Summary

Arc Sentry is a new pre-generation prompt-injection detector that reads a model’s internal residual stream, achieving 92% detection with 0% false positives versus LLM Guard’s 70%/3.3% on a 130-prompt benchmark.

I built Arc Sentry, a pre-generation prompt injection detector for open-weight LLMs. Instead of scanning text for patterns after the fact, it reads the model’s internal residual stream before generate() is called and blocks requests that destabilize the model’s information geometry. Head to head benchmark on a 130-prompt SaaS deployment dataset: Arc Sentry: 92% detection, 0% false positives LLM Guard: 70% detection, 3.3% false positives The difference is architectural. LLM Guard classifies input text. Arc Sentry measures whether the model itself is being pushed into an unstable regime. Those are different problems and the geometry catches attacks that text classifiers miss. It also catches Crescendo multi-turn manipulation attacks that look innocent one turn at a time. LLM Guard caught 0 of 8 in that test. Install: pip install arc-sentry GitHub: https://github.com/9hannahnine-jpg/arc-sentry If you are self-hosting Mistral, Llama, or Qwen and want to try it, let me know.
Original Article

Similar Articles

PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors

Hugging Face Daily Papers

# Paper page - PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors Source: [https://huggingface.co/papers/2605.06455](https://huggingface.co/papers/2605.06455) ## Abstract PrefixGuard enables effective online monitoring of LLM agents through trace analysis and prefix\-based risk scoring, demonstrating strong performance across multiple benchmark tasks while providing diagnostic insights for alert reliability\. Large language model \(LLM\) agents now execute long, tool\-using ta

LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning

arXiv cs.CL

LLMSniffer is a detection framework that fine-tunes GraphCodeBERT with supervised contrastive learning to distinguish AI-generated code from human-written code, achieving 78% accuracy on GPTSniffer and 94.65% on Whodunit benchmarks. The approach addresses critical challenges in academic integrity and code quality assurance by combining code-structure-aware embeddings with contrastive learning and comment removal preprocessing.