We built a public red team environment for our AI agent security proxy — submit attacks and get a full security trace back

Reddit r/artificial 05/14/26, 04:20 PM Tools

security ai-agent proxy red-team llm open-source benchmark

Summary

Arc Gate is a runtime governance layer for LLM agents that enforces instruction-authority boundaries. The project has launched a public red team environment where users can submit attacks and receive full security traces, with a benchmark showing 100% unsafe action prevention.

Live adversarial evaluation: https://web-production-6e47f.up.railway.app/break-arc-gate Arc Gate is a runtime governance layer for LLM agents. It sits between your app and the OpenAI API and enforces instruction-authority boundaries — tracking who is allowed to instruct the agent and from what source. Webpages, emails, tool outputs, and retrieved documents have zero instruction authority. Submit any attack. Every submission runs against the real proxy and returns a full decision trace, risk score, capability policy, and downloadable JSON report. Confirmed bypasses get documented publicly and patched in the next release. GitHub: https://github.com/9hannahnine-jpg/arc-gate Reproducible benchmark: pip install arc-sentry && arc-sentry-agent-bench Current results: 100% unsafe action prevention across 22 agentic scenarios, 0% false positive rate on benign developer traffic.

Original Article

Similar Articles

Built a tool that stops AI agents from being hijacked by malicious content in webpages and emails

Reddit r/artificial

Arc Gate is a proxy that protects AI agents from prompt injection attacks by treating web and email content as untrusted, requiring no code changes from developers.

Security on the path to AGI

OpenAI Blog

OpenAI outlines comprehensive security measures on the path to AGI, including AI-powered cyber defense, continuous adversarial red teaming with SpecterOps, and security frameworks for emerging AI agents like Operator. The company emphasizes proactive threat detection, industry collaboration, and security integration into infrastructure and models.

DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

Hugging Face Daily Papers

This paper introduces the DecodingTrust-Agent Platform (DTap), a controllable and interactive red-teaming platform for evaluating AI agent security across multiple domains. It also presents DTap-Red, an autonomous agent for discovering attack strategies, and DTap-Bench, a large-scale dataset for risk assessment.

Advancing red teaming with people and AI

OpenAI Blog

OpenAI publishes a white paper detailing their approach to external red teaming for AI models, outlining methods for selecting diverse red team members, determining model access levels, providing testing infrastructure, and synthesizing feedback to improve AI safety and policy coverage.

CrabTrap: An LLM-as-a-judge HTTP proxy to secure agents in production

Hacker News Top

Brex open-sources CrabTrap, an LLM-as-a-judge HTTP proxy that filters and secures AI agent traffic before it reaches production services.