We built a public red team environment for our AI agent security proxy — submit attacks and get a full security trace back
Summary
Arc Gate is a runtime governance layer for LLM agents that enforces instruction-authority boundaries. The project has launched a public red team environment where users can submit attacks and receive full security traces, with a benchmark showing 100% unsafe action prevention.
Similar Articles
Built a tool that stops AI agents from being hijacked by malicious content in webpages and emails
Arc Gate is a proxy that protects AI agents from prompt injection attacks by treating web and email content as untrusted, requiring no code changes from developers.
Security on the path to AGI
OpenAI outlines comprehensive security measures on the path to AGI, including AI-powered cyber defense, continuous adversarial red teaming with SpecterOps, and security frameworks for emerging AI agents like Operator. The company emphasizes proactive threat detection, industry collaboration, and security integration into infrastructure and models.
DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents
This paper introduces the DecodingTrust-Agent Platform (DTap), a controllable and interactive red-teaming platform for evaluating AI agent security across multiple domains. It also presents DTap-Red, an autonomous agent for discovering attack strategies, and DTap-Bench, a large-scale dataset for risk assessment.
Advancing red teaming with people and AI
OpenAI publishes a white paper detailing their approach to external red teaming for AI models, outlining methods for selecting diverse red team members, determining model access levels, providing testing infrastructure, and synthesizing feedback to improve AI safety and policy coverage.
CrabTrap: An LLM-as-a-judge HTTP proxy to secure agents in production
Brex open-sources CrabTrap, an LLM-as-a-judge HTTP proxy that filters and secures AI agent traffic before it reaches production services.