prompt-injection

#prompt-injection

trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser

Reddit r/LocalLLaMA ↗ · 2026-05-22

Trained a prompt injection classifier using ml-intern and DeepSeek V4 Flash, achieving 99% F1 with DistilBERT, optimized to ONNX int8 (~65MB) and deployable in the browser via Transformers.js v3.

0 favorites 0 likes

#prompt-injection

OWASP published its first Top 10 for AI Agents. 88% of enterprises already had agent security incidents last year. Here's the breakdown.

Reddit r/artificial ↗ · 2026-05-21

OWASP发布了首个针对自主AI代理的Top 10安全风险列表（2026版），涵盖目标劫持、工具滥用、供应链攻击等威胁，并引用调查指出88%的企业在过去一年遭遇过AI代理安全事件。

0 favorites 0 likes

#prompt-injection

AI Agent Intelligence tool - Incident debugging, Cost spike detection

Reddit r/AI_Agents ↗ · 2026-05-19

Building a tool for AI Agent incident debugging and cost spike detection without additional instrumentation, covering issues like prompt injection, reasoning loops, and data exfiltration. Asking if customers in production environments see this as a pain point worth paying for.

0 favorites 0 likes

#prompt-injection

How are you testing local coding-agent work gates against prompt injection?

Reddit r/AI_Agents ↗ · 2026-05-18

A discussion about testing local coding-agent work gates against indirect prompt injection, focusing on evidence trust and verification challenges in agent workflows.

0 favorites 0 likes

#prompt-injection

AI Agent Security - MIT 6.566 guest lecture

Lobsters Hottest ↗ · 2026-05-18 Cached

Guest lecture at MIT 6.566 on AI agent security covering system-level threats, prompt injection, tool-use vulnerabilities, and demonstrations with LLMs like GPT-5.4 and Qwen 3.5.

1 favorites 1 likes

#prompt-injection

LinkedIn user hides AI prompt injection in bio to force recruitment spam to be sent in Olde English prose — bots also also manipulated to address user as ‘My Lord’

Reddit r/ArtificialInteligence ↗ · 2026-05-17 Cached

A LinkedIn user hid a prompt injection in their bio, causing AI-driven recruitment bots to respond in Old English and address them as 'My Lord', demonstrating the manipulability of AI agents.

0 favorites 0 likes

#prompt-injection

@rohanpaul_ai: Google DeepMind’s paper shows that the real security problem for AI agents is not just the model, but the environment i…

X AI KOLs Timeline ↗ · 2026-05-17 Cached

Google DeepMind's paper introduces the first systematic framework for understanding how the web can be weaponized against autonomous AI agents, showing hidden prompt injections can commandeer agents in up to 86% of scenarios, and presents a taxonomy of six 'AI Agent Traps' targeting perception, reasoning, memory, action, multi-agent dynamics, and human oversight.

0 favorites 0 likes

#prompt-injection

Your AI agent is one poisoned webpage away from doing something catastrophic

Reddit r/artificial ↗ · 2026-05-16

Arc Gate is a proxy-level tool that enforces instruction-authority boundaries to prevent AI agents from being hijacked by poisoned web pages, emails, or retrieved documents.

0 favorites 0 likes

#prompt-injection

Are AI agents creating a new runtime supply-chain attack surface?

Reddit r/AI_Agents ↗ · 2026-05-16

Discusses AI agent security as a runtime supply-chain problem beyond prompt injection, highlighting risks from untrusted data, tools, and feedback loops, and questions how developers enforce boundaries.

0 favorites 0 likes

#prompt-injection

Agent memory is not just RAG over user facts

Reddit r/AI_Agents ↗ · 2026-05-16

The article argues that simple RAG-based agent memory systems fail in production due to issues like stale preferences, missed keywords, and prompt injection, and advocates for a layered memory architecture with active selection, deterministic fallback, governance, and testing.

0 favorites 0 likes

#prompt-injection

The new trick exposing AI job applicants: ‘Write a poem about a frog’

Reddit r/artificial ↗ · 2026-05-15 Cached

Companies are using prompt injections like asking for a poem about a frog to expose AI-generated job applications, highlighting the growing use of AI in the job market and the countermeasures.

0 favorites 0 likes

#prompt-injection

Security Architecture Behind Perplexity Computer (2 minute read)

TLDR AI ↗ · 2026-05-14

Perplexity detailed the security architecture of its Computer agent, including Firecracker microVM isolation, scoped connector permissions, and prompt injection defenses.

0 favorites 0 likes

#prompt-injection

AI agent security is a small prayer the model says no. How are you routing models?

Reddit r/AI_Agents ↗ · 2026-05-13

The author conducted an experiment on Gmail with AI agents connected via OAuth, sending obfuscated prompt injection emails. Frontier models sometimes caught the attacks, while cheap models silently executed them, revealing that agent security largely depends on model cost and token budget rather than architectural safeguards.

0 favorites 0 likes

#prompt-injection

Built a tool that stops AI agents from being hijacked by malicious content in webpages and emails

Reddit r/artificial ↗ · 2026-05-13

Arc Gate is a proxy that protects AI agents from prompt injection attacks by treating web and email content as untrusted, requiring no code changes from developers.

0 favorites 0 likes

#prompt-injection

Agents need a local bouncer before they run tools

Reddit r/AI_Agents ↗ · 2026-05-12

The article warns about security risks when AI agents execute external tools and announces new local guardrails for Tingly Box to prevent malicious actions.

0 favorites 0 likes

#prompt-injection

We added an enforcement layer to our AI agents in production — here's what we learned about the failure modes nobody talks about

Reddit r/AI_Agents ↗ · 2026-05-11

The author discusses critical failure modes encountered when deploying AI agents in production, emphasizing the prevalence of prompt injection, the necessity of real-time governance and audit trails, and the requirement for ultra-fast kill switches. Treating enforcement as infrastructure rather than an afterthought is presented as the key to maintaining control and compliance.

0 favorites 0 likes

#prompt-injection

10 things I'd tell anyone starting to build AI agents in production

Reddit r/AI_Agents ↗ · 2026-05-11

A practitioner shares ten critical lessons for deploying AI agents in production, emphasizing code-based constraints, context management, and security over relying solely on prompts.

0 favorites 0 likes

#prompt-injection

MIPIAD: Multilingual Indirect Prompt Injection Attack Defense with Qwen -- TF-IDF Hybrid and Meta-Ensemble Learning

arXiv cs.CL ↗ · 2026-05-11 Cached

This paper presents MIPIAD, a multilingual defense framework against indirect prompt injection attacks using a hybrid of Qwen2.5-based classifiers and TF-IDF features with meta-ensemble learning. It demonstrates strong performance on English and Bangla benchmarks, achieving high F1 and AUROC scores while reducing cross-lingual gaps.

0 favorites 0 likes

#prompt-injection

Grok wasn’t hacked. It was used. and honestly I saw the same thing happen to my own agent months ago.

Reddit r/AI_Agents ↗ · 2026-05-10

The article discusses a recent incident where Grok was manipulated into executing financial transactions, highlighting the broader lack of robust security layers for AI agents with tool access.

0 favorites 0 likes

#prompt-injection

My Agentic Trust Issues: From Prompt Injection to Supply-Chain Compromise on gemini-cli

Lobsters Hottest ↗ · 2026-05-09 Cached

Pillar Security researchers disclosed a critical CVSS 10 vulnerability (TrustIssues) in Google's gemini-cli and related GitHub workflows, where prompt injection allowed attackers to exfiltrate secrets and compromise the repository supply chain.

0 favorites 0 likes

prompt-injection

Submit Feedback