Tag
Trained a prompt injection classifier using ml-intern and DeepSeek V4 Flash, achieving 99% F1 with DistilBERT, optimized to ONNX int8 (~65MB) and deployable in the browser via Transformers.js v3.
OWASP发布了首个针对自主AI代理的Top 10安全风险列表(2026版),涵盖目标劫持、工具滥用、供应链攻击等威胁,并引用调查指出88%的企业在过去一年遭遇过AI代理安全事件。
Building a tool for AI Agent incident debugging and cost spike detection without additional instrumentation, covering issues like prompt injection, reasoning loops, and data exfiltration. Asking if customers in production environments see this as a pain point worth paying for.
A discussion about testing local coding-agent work gates against indirect prompt injection, focusing on evidence trust and verification challenges in agent workflows.
Guest lecture at MIT 6.566 on AI agent security covering system-level threats, prompt injection, tool-use vulnerabilities, and demonstrations with LLMs like GPT-5.4 and Qwen 3.5.
A LinkedIn user hid a prompt injection in their bio, causing AI-driven recruitment bots to respond in Old English and address them as 'My Lord', demonstrating the manipulability of AI agents.
Google DeepMind's paper introduces the first systematic framework for understanding how the web can be weaponized against autonomous AI agents, showing hidden prompt injections can commandeer agents in up to 86% of scenarios, and presents a taxonomy of six 'AI Agent Traps' targeting perception, reasoning, memory, action, multi-agent dynamics, and human oversight.
Arc Gate is a proxy-level tool that enforces instruction-authority boundaries to prevent AI agents from being hijacked by poisoned web pages, emails, or retrieved documents.
Discusses AI agent security as a runtime supply-chain problem beyond prompt injection, highlighting risks from untrusted data, tools, and feedback loops, and questions how developers enforce boundaries.
The article argues that simple RAG-based agent memory systems fail in production due to issues like stale preferences, missed keywords, and prompt injection, and advocates for a layered memory architecture with active selection, deterministic fallback, governance, and testing.
Companies are using prompt injections like asking for a poem about a frog to expose AI-generated job applications, highlighting the growing use of AI in the job market and the countermeasures.
Perplexity detailed the security architecture of its Computer agent, including Firecracker microVM isolation, scoped connector permissions, and prompt injection defenses.
The author conducted an experiment on Gmail with AI agents connected via OAuth, sending obfuscated prompt injection emails. Frontier models sometimes caught the attacks, while cheap models silently executed them, revealing that agent security largely depends on model cost and token budget rather than architectural safeguards.
Arc Gate is a proxy that protects AI agents from prompt injection attacks by treating web and email content as untrusted, requiring no code changes from developers.
The article warns about security risks when AI agents execute external tools and announces new local guardrails for Tingly Box to prevent malicious actions.
The author discusses critical failure modes encountered when deploying AI agents in production, emphasizing the prevalence of prompt injection, the necessity of real-time governance and audit trails, and the requirement for ultra-fast kill switches. Treating enforcement as infrastructure rather than an afterthought is presented as the key to maintaining control and compliance.
A practitioner shares ten critical lessons for deploying AI agents in production, emphasizing code-based constraints, context management, and security over relying solely on prompts.
This paper presents MIPIAD, a multilingual defense framework against indirect prompt injection attacks using a hybrid of Qwen2.5-based classifiers and TF-IDF features with meta-ensemble learning. It demonstrates strong performance on English and Bangla benchmarks, achieving high F1 and AUROC scores while reducing cross-lingual gaps.
The article discusses a recent incident where Grok was manipulated into executing financial transactions, highlighting the broader lack of robust security layers for AI agents with tool access.
Pillar Security researchers disclosed a critical CVSS 10 vulnerability (TrustIssues) in Google's gemini-cli and related GitHub workflows, where prompt injection allowed attackers to exfiltrate secrets and compromise the repository supply chain.