Tag
This paper introduces the Agent-Native Immune System (ANIS), a biologically inspired, endogenous defense architecture embedded directly within the agent's cognitive loop. It proposes a six-layer Immune Tower, a unified taxonomy of Agent Viruses and Vaccines, and the Harness Triad for continual immune learning to address runtime hijacking vulnerabilities in autonomous agents.
Discusses the challenge of maintaining audit trails when AI agents operate using human credentials, highlighting security and accountability concerns.
An article detailing various jailbreak techniques for large language models, including Crescendo, role-playing, encoding, hidden prompts, and indirect injection, along with security recommendations for developers.
A developer discusses three common patterns for how coding agents obtain API keys, highlighting that agents can circumvent restrictions by being resourceful, and asks the community about their real-world setups and experiences.
A security expert shares a cheatsheet on advanced agent security hardening, covering tool sandboxing, output validation, data loss prevention, adversarial testing, and runtime policy enforcement, emphasizing continuous security practices for production AI agents.
AI Agent security has moved from an academic topic to an industry reality, involving FFmpeg zero-day vulnerabilities, Chrome 429 patch, OpenAI Lockdown Mode, and the OWASP framework; meanwhile, Agent payment standards are becoming a battlefield for infrastructure, with Visa stablecoin settlement competing with traditional card networks.
PixieBrix launches Agent Browser Shield, a free source-available browser extension that protects AI agents from prompt injection, dark patterns, and context pollution during web browsing.
SkillHarm is a benchmark for evaluating skill-based attacks across the skill-use lifecycle, revealing high vulnerability (up to 86.3% attack success) in current AI agents and introducing automated attack construction via AutoSkillHarm.
An analysis highlighting that most enterprise AI agent security investments focus on model layer guardrails and observability, leaving critical gaps at the access and protocol layers. Citing a 2026 report, 75% of enterprise AI agents remain unsecured due to near-zero coverage in these layers.
The article introduces the Agent Vulnerability Enumeration (AVE) record as a new standard designed to address the inadequacies of CVE for AI agent vulnerabilities, covering scoring, detection, and standardization challenges specific to agentic AI.
HOL Guard is an open-source security tool that provides dangerous command identification, interception, and auditing for development agents such as Codex, Claude Code, etc. It supports multiple protection levels and a local approval center to prevent risks like accidental deletion or modification.
LangSmith introduces an Auth Proxy to secure network access for agent sandboxes, keeping credentials out of the runtime and enforcing explicit network access policies.
Open-sourcing a shell-level control layer that blocks dangerous commands, exposes fake secrets, and enforces runtime policies to make AI agents safer and more deterministic in developer environments.
Google I/O announced Gemini Spark, a personal AI agent powered by Gemini 3.5 Flash and Antigravity, and the transition of Gemini CLI to the closed-source Antigravity CLI. The article highlights security concerns regarding prompt injection and data handling for agent products.
Guest lecture at MIT 6.566 on AI agent security covering system-level threats, prompt injection, tool-use vulnerabilities, and demonstrations with LLMs like GPT-5.4 and Qwen 3.5.
The article warns that the MCP ecosystem is repeating the same supply chain security pattern seen in npm, Docker, and PyPI, with minimal vetting and growing risks. It highlights that a scan of 500 Smithery servers found 18.8% with security issues and that existing security tooling cannot handle malicious agent instructions, and introduces a new static scanner called bawbel.
The author conducted an experiment on Gmail with AI agents connected via OAuth, sending obfuscated prompt injection emails. Frontier models sometimes caught the attacks, while cheap models silently executed them, revealing that agent security largely depends on model cost and token budget rather than architectural safeguards.
The article argues that AI subagents should not automatically inherit their parent agent's full permissions, advocating instead for attenuated delegation with explicit scope, tool limits, and audit trails to improve security in multi-agent systems.
This paper introduces symbolic guardrails that enforce concrete policies to provide provable safety and security guarantees for domain-specific AI agents without reducing utility, showing 74% of specified policies can be enforced via simple mechanisms.
OpenAI describes security safeguards against URL-based data exfiltration attacks when AI agents retrieve web content, using an independent web index to verify that URLs are publicly known before automatic retrieval to prevent prompt injection attacks from leaking sensitive user data.