prompt-injection

#prompt-injection

AI agent security is a small prayer the model says no. How are you routing models?

Reddit r/AI_Agents ↗ · 9h ago

The author conducted an experiment on Gmail with AI agents connected via OAuth, sending obfuscated prompt injection emails. Frontier models sometimes caught the attacks, while cheap models silently executed them, revealing that agent security largely depends on model cost and token budget rather than architectural safeguards.

0 favorites 0 likes

#prompt-injection

Built a tool that stops AI agents from being hijacked by malicious content in webpages and emails

Reddit r/artificial ↗ · 11h ago

Arc Gate is a proxy that protects AI agents from prompt injection attacks by treating web and email content as untrusted, requiring no code changes from developers.

0 favorites 0 likes

#prompt-injection

Agents need a local bouncer before they run tools

Reddit r/AI_Agents ↗ · yesterday

The article warns about security risks when AI agents execute external tools and announces new local guardrails for Tingly Box to prevent malicious actions.

0 favorites 0 likes

#prompt-injection

We added an enforcement layer to our AI agents in production — here's what we learned about the failure modes nobody talks about

Reddit r/AI_Agents ↗ · 2d ago

The author discusses critical failure modes encountered when deploying AI agents in production, emphasizing the prevalence of prompt injection, the necessity of real-time governance and audit trails, and the requirement for ultra-fast kill switches. Treating enforcement as infrastructure rather than an afterthought is presented as the key to maintaining control and compliance.

0 favorites 0 likes

#prompt-injection

10 things I'd tell anyone starting to build AI agents in production

Reddit r/AI_Agents ↗ · 2d ago

A practitioner shares ten critical lessons for deploying AI agents in production, emphasizing code-based constraints, context management, and security over relying solely on prompts.

0 favorites 0 likes

#prompt-injection

MIPIAD: Multilingual Indirect Prompt Injection Attack Defense with Qwen -- TF-IDF Hybrid and Meta-Ensemble Learning

arXiv cs.CL ↗ · 2d ago Cached

This paper presents MIPIAD, a multilingual defense framework against indirect prompt injection attacks using a hybrid of Qwen2.5-based classifiers and TF-IDF features with meta-ensemble learning. It demonstrates strong performance on English and Bangla benchmarks, achieving high F1 and AUROC scores while reducing cross-lingual gaps.

0 favorites 0 likes

#prompt-injection

Grok wasn’t hacked. It was used. and honestly I saw the same thing happen to my own agent months ago.

Reddit r/AI_Agents ↗ · 3d ago

The article discusses a recent incident where Grok was manipulated into executing financial transactions, highlighting the broader lack of robust security layers for AI agents with tool access.

0 favorites 0 likes

#prompt-injection

My Agentic Trust Issues: From Prompt Injection to Supply-Chain Compromise on gemini-cli

Lobsters Hottest ↗ · 4d ago Cached

Pillar Security researchers disclosed a critical CVSS 10 vulnerability (TrustIssues) in Google's gemini-cli and related GitHub workflows, where prompt injection allowed attackers to exfiltrate secrets and compromise the repository supply chain.

0 favorites 0 likes

#prompt-injection

Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.

Reddit r/artificial ↗ · 2026-04-23

Arc Sentry is a new pre-generation prompt-injection detector that reads a model’s internal residual stream, achieving 92% detection with 0% false positives versus LLM Guard’s 70%/3.3% on a 130-prompt benchmark.

0 favorites 0 likes

#prompt-injection

Designing AI agents to resist prompt injection

OpenAI Blog ↗ · 2026-03-11 Cached

OpenAI publishes guidance on designing AI agents resistant to prompt injection attacks, arguing that modern attacks increasingly use social engineering tactics rather than simple string injections, and advocating for system-level defenses that constrain impact rather than relying solely on input filtering.

0 favorites 0 likes

#prompt-injection

Improving instruction hierarchy in frontier LLMs

OpenAI Blog ↗ · 2026-03-10 Cached

OpenAI presents a training approach using instruction-hierarchy tasks to improve LLM safety and reliability by teaching models to properly prioritize instructions based on trust levels (system > developer > user > tool). The method addresses prompt-injection attacks and safety steerability through reinforcement learning with a new dataset called IH-Challenge.

0 favorites 0 likes

#prompt-injection

Introducing Lockdown Mode and Elevated Risk labels in ChatGPT

OpenAI Blog ↗ · 2026-02-13 Cached

OpenAI introduces Lockdown Mode and Elevated Risk labels in ChatGPT to mitigate prompt injection attacks and protect sensitive data. Lockdown Mode is an advanced security setting for high-risk users that constrains ChatGPT's interaction with external systems and is available for enterprise plans with planned consumer rollout.

0 favorites 0 likes

#prompt-injection

Keeping your data safe when an AI agent clicks a link

OpenAI Blog ↗ · 2026-01-28 Cached

OpenAI describes security safeguards against URL-based data exfiltration attacks when AI agents retrieve web content, using an independent web index to verify that URLs are publicly known before automatic retrieval to prevent prompt injection attacks from leaking sensitive user data.

0 favorites 0 likes

#prompt-injection

Continuously hardening ChatGPT Atlas against prompt injection

OpenAI Blog ↗ · 2025-12-22 Cached

OpenAI announces security hardening of ChatGPT Atlas against prompt injection attacks through adversarial training and strengthened safeguards, including a rapid response loop for discovering and mitigating novel attack strategies before they appear in the wild.

0 favorites 0 likes

#prompt-injection

Understanding prompt injections: a frontier security challenge

OpenAI Blog ↗ · 2025-11-07 Cached

OpenAI publishes guidance on prompt injection attacks, a social engineering vulnerability where malicious instructions hidden in web content or documents can trick AI models into unintended actions. The company outlines its multi-layered defense strategy including instruction hierarchy research, automated red-teaming, and AI-powered monitoring systems.

0 favorites 0 likes

#prompt-injection

Advancing Gemini's security safeguards

Google DeepMind Blog ↗ · 2025-05-20 Cached

Google DeepMind announces advanced security improvements for Gemini to defend against indirect prompt injection attacks through model hardening, adaptive evaluation, and layered defense mechanisms. The approach combines fine-tuning on adversarial scenarios with system-level guardrails to build inherent resilience while maintaining model performance.

0 favorites 0 likes

#prompt-injection

Empowering defenders through our Cybersecurity Grant Program

OpenAI Blog ↗ · 2024-06-20 Cached

OpenAI highlights grantees from its Cybersecurity Grant Program, supporting projects ranging from defending LLMs against prompt-injection attacks to autonomous cyber defense agents and secure AI inference infrastructure.

0 favorites 0 likes

#prompt-injection

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

OpenAI Blog ↗ · 2024-04-19 Cached

OpenAI proposes an instruction hierarchy approach to defend LLMs against prompt injection and jailbreak attacks by training models to prioritize system instructions over user inputs. The method significantly improves robustness without degrading standard capabilities.

0 favorites 0 likes

#prompt-injection

Don't Switch to an AI Browser (Until You Watch This)

YouTube AI Channels ↗ · 5d ago Cached

AI browsers like OpenAI's Atlas and Perplexity's Comet embed AI assistants directly into browsing with memory and agentic capabilities, but significant security risks from prompt injection attacks make them unsuitable for sensitive use.

0 favorites 0 likes

prompt-injection

Submit Feedback