We built a public red team environment for our AI agent security proxy — submit attacks and get a full security trace back

Reddit r/artificial Tools

Summary

Arc Gate is a runtime governance layer for LLM agents that enforces instruction-authority boundaries. The project has launched a public red team environment where users can submit attacks and receive full security traces, with a benchmark showing 100% unsafe action prevention.

Live adversarial evaluation: https://web-production-6e47f.up.railway.app/break-arc-gate Arc Gate is a runtime governance layer for LLM agents. It sits between your app and the OpenAI API and enforces instruction-authority boundaries — tracking who is allowed to instruct the agent and from what source. Webpages, emails, tool outputs, and retrieved documents have zero instruction authority. Submit any attack. Every submission runs against the real proxy and returns a full decision trace, risk score, capability policy, and downloadable JSON report. Confirmed bypasses get documented publicly and patched in the next release. GitHub: https://github.com/9hannahnine-jpg/arc-gate Reproducible benchmark: pip install arc-sentry && arc-sentry-agent-bench Current results: 100% unsafe action prevention across 22 agentic scenarios, 0% false positive rate on benign developer traffic.​​​​​​​​​​​​​​​​
Original Article

Similar Articles

Security on the path to AGI

OpenAI Blog

OpenAI outlines comprehensive security measures on the path to AGI, including AI-powered cyber defense, continuous adversarial red teaming with SpecterOps, and security frameworks for emerging AI agents like Operator. The company emphasizes proactive threat detection, industry collaboration, and security integration into infrastructure and models.

Advancing red teaming with people and AI

OpenAI Blog

OpenAI publishes a white paper detailing their approach to external red teaming for AI models, outlining methods for selecting diverse red team members, determining model access levels, providing testing infrastructure, and synthesizing feedback to improve AI safety and policy coverage.