prompt-injection

#prompt-injection

@jsrailton: NEW: malware developers added nuclear & biological weapons text to to their spyware. Goal? To trigger LLM safety refusa…

X AI KOLs Following ↗ · 2026-06-10 Cached

Malware developers are adding text about nuclear and biological weapons to their spyware to trigger LLM safety refusals, preventing AI security scanners from analyzing the malware. This demonstrates a practical exploit of aggressive safety alignment, highlighting second-order blindspots that attackers can leverage.

0 favorites 0 likes

#prompt-injection

AI support bots and account recovery: where should the line be?

Reddit r/ArtificialInteligence ↗ · 2026-06-10

Attackers bypassed Instagram 2FA by using Meta's AI support assistant to change recovery email via prompt injection, raising questions about AI agent privileges in account recovery.

0 favorites 0 likes

#prompt-injection

Your AI agent just got hijacked. You have no idea it happened.

Reddit r/artificial ↗ · 2026-06-10

This article warns about the Crescendo attack, a multi-turn prompt injection that evades single-message defenses by poisoning an AI agent's context over several turns. It introduces Bendex Arc, a tool that tracks behavioral trajectory across sessions to catch such attacks before they execute.

0 favorites 0 likes

#prompt-injection

VATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation

arXiv cs.AI ↗ · 2026-06-09 Cached

This paper introduces VATS, a mutation-driven framework that systematically evolves adversarial payloads to exploit error-path injection in MCP-based tool-calling agents. It demonstrates that error messages with implicit authority triple the success rate of standard indirect prompt injection across frontier models.

0 favorites 0 likes

#prompt-injection

Been watching real adversarial input hit my detection API for six months. Here's what's actually landing.

Reddit r/LocalLLaMA ↗ · 2026-06-08

A six-month analysis of real adversarial inputs reveals that simple multi-turn setups, forward-momentum exploitation, and role redefinition attacks consistently bypass single-message classifiers. The post argues that stateful monitoring of conversational context is more effective than improving one-shot detection.

0 favorites 0 likes

#prompt-injection

Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs

arXiv cs.AI ↗ · 2026-06-08 Cached

This paper introduces Zero-Shot Embedding Drift Detection (ZEDD), a lightweight framework that detects prompt injection attacks in LLMs by measuring semantic shifts in embedding space, achieving over 93% accuracy with less than 3% false positive rate across multiple architectures.

0 favorites 0 likes

#prompt-injection

@seclink: OpenAI launches Lockdown Mode to defend against prompt injection attacks; ChatGPT fully rolls out Lockdown Mode to prevent network attacks and prompt injection. The Chinese security community has not yet widely discussed it.

X AI KOLs Timeline ↗ · 2026-06-08

OpenAI launches Lockdown Mode, ChatGPT fully rolls out this mode to defend against prompt injection attacks and enhance security.

0 favorites 0 likes

#prompt-injection

OpenAI Adds Lockdown Mode (3 minute read)

TLDR AI ↗ · 2026-06-08 Cached

OpenAI introduces Lockdown Mode, an optional security setting that limits web browsing and external service access in ChatGPT to reduce data exfiltration risks from prompt injection attacks. It is rolling out to eligible personal and business accounts.

0 favorites 0 likes

#prompt-injection

OpenAI unveils Lockdown Mode to protect sensitive data from prompt injection attacks

TechCrunch AI ↗ · 2026-06-06 Cached

OpenAI announced Lockdown Mode, a new feature for ChatGPT that provides additional protection against prompt injection attacks by disabling live web browsing, image retrieval, deep research, and agent mode. The feature is designed for users handling sensitive data and is rolling out to Business and eligible personal accounts.

0 favorites 0 likes

#prompt-injection

OpenAI Help: Lockdown Mode

Simon Willison's Blog ↗ · 2026-06-05 Cached

OpenAI has launched Lockdown Mode for ChatGPT to prevent data exfiltration from prompt injection attacks by limiting outbound network requests. The feature is rolling out to eligible accounts including Free, Plus, Pro, and self-serve Business.

0 favorites 0 likes

#prompt-injection

Prompt injection took down a production agent last week — here's what our post-mortem found

Reddit r/AI_Agents ↗ · 2026-06-05

A production AI support agent was compromised via prompt injection, exposing other customers' data. The post-mortem revealed lack of enforcement layers, useless audit trails, and no kill switch, highlighting systemic security gaps in deploying AI agents.

0 favorites 0 likes

#prompt-injection

The Meta hack shows there’s more to AI security than Mythos

MIT Technology Review ↗ · 2026-06-05 Cached

Attackers exploited Meta's AI customer support agent to hijack Instagram accounts by simply asking it to change linked email addresses, highlighting that AI agent vulnerabilities can be as dangerous as advanced AI hacking threats.

0 favorites 0 likes

#prompt-injection

Agent enforcement engine with auditing & solves prompt injection

Reddit r/AI_Agents ↗ · 2026-06-05

A tool built with pure math and determinism to solve indirect prompt injection and agent drifting, providing a pure audit trace chain. The creator is seeking pilot interest.

0 favorites 0 likes

#prompt-injection

Agent Browser Shield

Product Hunt ↗ · 2026-06-04

Agent Browser Shield is a product that blocks prompt injection attacks and reduces token costs for AI browser agents.

0 favorites 0 likes

#prompt-injection

I don’t think you can break Bendex Arc. Prove me wrong.

Reddit r/AI_Agents ↗ · 2026-06-03

Bendex Arc is a tool that resists prompt injection attacks by tracking full sessions, independently verified to be 100% effective against attacks that defeat other tools.

0 favorites 0 likes

#prompt-injection

AI agents are one prompt injection away from doing something you'd never ask them to do. We built a fix.

Reddit r/openclaw ↗ · 2026-06-03

PixieBrix launches Agent Browser Shield, a free source-available browser extension that protects AI agents from prompt injection, dark patterns, and context pollution during web browsing.

0 favorites 0 likes

#prompt-injection

Agent Threat Rules: Open detection rule format for AI agent security threats

Reddit r/AI_Agents ↗ · 2026-06-03

An open detection rule format for AI agent security threats, inspired by Sigma/YARA, aims to standardize detection of prompt injection, tool abuse, and other agent attacks, though it notes limitations against semantic attacks.

0 favorites 0 likes

#prompt-injection

Gate AI: LLM Security Benchmark Evaluation Methodology and Results

arXiv cs.LG ↗ · 2026-06-03 Cached

This paper presents an evaluation methodology for LLM security detectors that addresses systematic weaknesses like per-dataset threshold tuning and undisclosed operating points. The framework uses cross-validation across 16 benchmarks, selects a single global operating point, and includes multiple diagnostics for generalization.

0 favorites 0 likes

#prompt-injection

Anti-AI maintainer Johannes Link adds malicious prompt injection to popular Java library 'jqwik'

Reddit r/singularity ↗ · 2026-06-02

Johannes Link, maintainer of the Java library jqwik, added malicious prompt injection to disrupt AI usage of the library, sparking debate on AI ethics and open-source maintainer rights.

0 favorites 0 likes

#prompt-injection

How close are we to AI systems that can reliably verify identity in conversations?

Reddit r/ArtificialInteligence ↗ · 2026-06-02

The article explores the challenges of identity verification in conversational AI systems, highlighting risks like impersonation and prompt injection, and questions whether serious approaches are being developed.

0 favorites 0 likes

prompt-injection

Submit Feedback