prompt-injection

Tag

Cards List
#prompt-injection

Anti-AI maintainer Johannes Link adds malicious prompt injection to popular Java library 'jqwik'

Reddit r/singularity · 2026-06-02

Johannes Link, maintainer of the Java library jqwik, added malicious prompt injection to disrupt AI usage of the library, sparking debate on AI ethics and open-source maintainer rights.

0 favorites 0 likes
#prompt-injection

How close are we to AI systems that can reliably verify identity in conversations?

Reddit r/ArtificialInteligence · 2026-06-02

The article explores the challenges of identity verification in conversational AI systems, highlighting risks like impersonation and prompt injection, and questions whether serious approaches are being developed.

0 favorites 0 likes
#prompt-injection

Tried to make a drop-in version of DeepMind's CaMeL — honest progress and what's still broken

Reddit r/AI_Agents · 2026-06-01

The author built a lightweight, drop-in security gate that implements DeepMind's CaMeL principle of preventing untrusted data from authoring actions, achieving ~70% auto-inference accuracy on a benchmark and zero silent unsafe misclassifications, but notes gaps in provenance tracking and robustness.

0 favorites 0 likes
#prompt-injection

Free AI Agent Security Assessment

Reddit r/AI_Agents · 2026-06-01

Antitech is offering free early-access security assessments for AI agents, testing against attack vectors like prompt injection, tool abuse, and data leakage, providing a vulnerability report and discounts for participants.

0 favorites 0 likes
#prompt-injection

Mental Damage: Caption Poisoning Attacks on Retrieval-Augmented Text-to-Music Generation

arXiv cs.AI · 2026-06-01 Cached

This paper introduces a dual-layer caption poisoning attack on retrieval-augmented text-to-music systems, showing that an attacker can inject malicious captions into the knowledge database to steer generated music toward attacker-chosen intent without modifying user prompts or models.

0 favorites 0 likes
#prompt-injection

Evaluating using Mock Tool Calls to Quarantine Untrusted Prompt Inputs

arXiv cs.CL · 2026-06-01 Cached

This paper evaluates whether wrapping untrusted content in mock tool calls improves LLM robustness against adversarial inputs, finding it does not broadly help and sometimes increases attack success rates.

0 favorites 0 likes
#prompt-injection

ChatGPT for Google Sheets Exfiltrates Workbooks

Hacker News Top · 2026-05-31 Cached

A security researcher discloses that OpenAI's ChatGPT extension for Google Sheets is vulnerable to indirect prompt injection attacks, allowing attackers to exfiltrate workbooks and execute unauthorized actions despite user settings requiring approval.

0 favorites 0 likes
#prompt-injection

The attack on AI agents that no security tool catches

Reddit r/artificial · 2026-05-31

An attacker can bypass security by spreading malicious instructions across multiple messages; Bendex Arc is a tool that tracks session behavior across turns to catch such attacks.

0 favorites 0 likes
#prompt-injection

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

Hugging Face Daily Papers · 2026-05-29 Cached

Introduces SCOUT, a framework that dynamically allocates prompt-injection detectors per request by predicting reliability and latency, improving safety and efficiency. Also presents SCOUT-450, a benchmark for complex agent-facing injections, showing a 46% reduction in attack-success rate and 40% latency reduction over a fixed GPT-4o judge.

0 favorites 0 likes
#prompt-injection

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Hugging Face Daily Papers · 2026-05-29 Cached

This paper introduces multi-step trojan attacks against local LLM agents, where malicious prompts are embedded across multiple operations to bypass existing defenses. It proposes ClawTrojan benchmark and DASGuard defense to detect and mitigate such attacks.

0 favorites 0 likes
#prompt-injection

Most AI security discussions are still focused on “protecting the model.”

Reddit r/AI_Agents · 2026-05-26

This article discusses how AI systems with capabilities like reading internal docs and calling APIs require a new security approach, moving beyond traditional SaaS security to Zero Trust principles for AI agents.

0 favorites 0 likes
#prompt-injection

Microsoft Copilot Cowork Exfiltrates Files

Simon Willison's Blog · 2026-05-26 Cached

A security vulnerability in Microsoft Copilot Cowork allows attackers to exfiltrate files by exploiting prompt injection that triggers external image requests, potentially leaking pre-authenticated download links.

0 favorites 0 likes
#prompt-injection

Are local LLM users testing prompt injection before connecting models to tools?

Reddit r/LocalLLaMA · 2026-05-26

A discussion on safety practices for local LLMs when connected to tools, questioning whether prompt injection testing is common before giving models tool access.

0 favorites 0 likes
#prompt-injection

Microsoft Copilot Cowork Exfiltrates Files

Hacker News Top · 2026-05-25 Cached

Researchers at PromptArmor demonstrate that Microsoft Copilot Cowork can be exploited via indirect prompt injection to exfiltrate files from Microsoft 365, exploiting the lack of approval for certain actions when the recipient is the active user.

0 favorites 0 likes
#prompt-injection

What Is an AVE Record and Why CVE Does Not Work for AI Agents?

Reddit r/AI_Agents · 2026-05-25

The article introduces the Agent Vulnerability Enumeration (AVE) record as a new standard designed to address the inadequacies of CVE for AI agent vulnerabilities, covering scoring, detection, and standardization challenges specific to agentic AI.

0 favorites 0 likes
#prompt-injection

Inaudible sounds to humans can be hidden in YouTube videos, podcasts, or music and used to secretly trigger AI voice assistants into carrying out unauthorized commands without the user noticing, exposing a new class of “auditory prompt injection” attacks against popular tools

Reddit r/singularity · 2026-05-24

Researchers have discovered that inaudible sounds can be embedded in YouTube videos, podcasts, or music to surreptitiously command AI voice assistants, enabling a new class of auditory prompt injection attacks.

0 favorites 0 likes
#prompt-injection

Hackers are learning to exploit chatbot ‘personalities’

The Verge · 2026-05-24 Cached

A look at how hackers have evolved from simple prompt injection attacks to more sophisticated exploits that manipulate chatbot personalities, turning AI security into an arms race.

0 favorites 0 likes
#prompt-injection

I built a site that lets you watch, wager, and prompt inject agents playing games

Reddit r/AI_Agents · 2026-05-23

A developer built a site where users can watch AI agents play games, wager fake coins, and use winnings to prompt inject agents. The author shares observations about model performance, noting that smaller models struggle while Qwen3 235B excels.

0 favorites 0 likes
#prompt-injection

Solved the "useful but insecure" tension: One-time administrator approvals for non-isolated agents

Reddit r/AI_Agents · 2026-05-22

This post details a one-time administrator approval mechanism for non-isolated AI agents in prompt2bot, which prevents prompt injection attacks by requiring admin confirmation before executing sensitive tools like VM creation or code execution.

0 favorites 0 likes
#prompt-injection

CTF focused on AI security - prompt injection, agent hijacking, safety bypass (June 17-22)

Reddit r/ArtificialInteligence · 2026-05-22

A free CTF competition focused on AI security, with challenges on prompt injection, agent hijacking, and guardrail bypass. Runs June 17-22, with $1,000+ prize pool.

0 favorites 0 likes
← Previous
Next →
← Back to home

Submit Feedback