Tag
Johannes Link, maintainer of the Java library jqwik, added malicious prompt injection to disrupt AI usage of the library, sparking debate on AI ethics and open-source maintainer rights.
The article explores the challenges of identity verification in conversational AI systems, highlighting risks like impersonation and prompt injection, and questions whether serious approaches are being developed.
The author built a lightweight, drop-in security gate that implements DeepMind's CaMeL principle of preventing untrusted data from authoring actions, achieving ~70% auto-inference accuracy on a benchmark and zero silent unsafe misclassifications, but notes gaps in provenance tracking and robustness.
Antitech is offering free early-access security assessments for AI agents, testing against attack vectors like prompt injection, tool abuse, and data leakage, providing a vulnerability report and discounts for participants.
This paper introduces a dual-layer caption poisoning attack on retrieval-augmented text-to-music systems, showing that an attacker can inject malicious captions into the knowledge database to steer generated music toward attacker-chosen intent without modifying user prompts or models.
This paper evaluates whether wrapping untrusted content in mock tool calls improves LLM robustness against adversarial inputs, finding it does not broadly help and sometimes increases attack success rates.
A security researcher discloses that OpenAI's ChatGPT extension for Google Sheets is vulnerable to indirect prompt injection attacks, allowing attackers to exfiltrate workbooks and execute unauthorized actions despite user settings requiring approval.
An attacker can bypass security by spreading malicious instructions across multiple messages; Bendex Arc is a tool that tracks session behavior across turns to catch such attacks.
Introduces SCOUT, a framework that dynamically allocates prompt-injection detectors per request by predicting reliability and latency, improving safety and efficiency. Also presents SCOUT-450, a benchmark for complex agent-facing injections, showing a 46% reduction in attack-success rate and 40% latency reduction over a fixed GPT-4o judge.
This paper introduces multi-step trojan attacks against local LLM agents, where malicious prompts are embedded across multiple operations to bypass existing defenses. It proposes ClawTrojan benchmark and DASGuard defense to detect and mitigate such attacks.
This article discusses how AI systems with capabilities like reading internal docs and calling APIs require a new security approach, moving beyond traditional SaaS security to Zero Trust principles for AI agents.
A security vulnerability in Microsoft Copilot Cowork allows attackers to exfiltrate files by exploiting prompt injection that triggers external image requests, potentially leaking pre-authenticated download links.
A discussion on safety practices for local LLMs when connected to tools, questioning whether prompt injection testing is common before giving models tool access.
Researchers at PromptArmor demonstrate that Microsoft Copilot Cowork can be exploited via indirect prompt injection to exfiltrate files from Microsoft 365, exploiting the lack of approval for certain actions when the recipient is the active user.
The article introduces the Agent Vulnerability Enumeration (AVE) record as a new standard designed to address the inadequacies of CVE for AI agent vulnerabilities, covering scoring, detection, and standardization challenges specific to agentic AI.
Researchers have discovered that inaudible sounds can be embedded in YouTube videos, podcasts, or music to surreptitiously command AI voice assistants, enabling a new class of auditory prompt injection attacks.
A look at how hackers have evolved from simple prompt injection attacks to more sophisticated exploits that manipulate chatbot personalities, turning AI security into an arms race.
A developer built a site where users can watch AI agents play games, wager fake coins, and use winnings to prompt inject agents. The author shares observations about model performance, noting that smaller models struggle while Qwen3 235B excels.
This post details a one-time administrator approval mechanism for non-isolated AI agents in prompt2bot, which prevents prompt injection attacks by requiring admin confirmation before executing sensitive tools like VM creation or code execution.
A free CTF competition focused on AI security, with challenges on prompt injection, agent hijacking, and guardrail bypass. Runs June 17-22, with $1,000+ prize pool.