prompt-injection

#prompt-injection

Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.

Reddit r/artificial ↗ · 2026-04-23

Arc Sentry is a new pre-generation prompt-injection detector that reads a model’s internal residual stream, achieving 92% detection with 0% false positives versus LLM Guard’s 70%/3.3% on a 130-prompt benchmark.

0 favorites 0 likes

#prompt-injection

Designing AI agents to resist prompt injection

OpenAI Blog ↗ · 2026-03-11 Cached

OpenAI publishes guidance on designing AI agents resistant to prompt injection attacks, arguing that modern attacks increasingly use social engineering tactics rather than simple string injections, and advocating for system-level defenses that constrain impact rather than relying solely on input filtering.

0 favorites 0 likes

#prompt-injection

Improving instruction hierarchy in frontier LLMs

OpenAI Blog ↗ · 2026-03-10 Cached

OpenAI presents a training approach using instruction-hierarchy tasks to improve LLM safety and reliability by teaching models to properly prioritize instructions based on trust levels (system > developer > user > tool). The method addresses prompt-injection attacks and safety steerability through reinforcement learning with a new dataset called IH-Challenge.

0 favorites 0 likes

#prompt-injection

Introducing Lockdown Mode and Elevated Risk labels in ChatGPT

OpenAI Blog ↗ · 2026-02-13 Cached

OpenAI introduces Lockdown Mode and Elevated Risk labels in ChatGPT to mitigate prompt injection attacks and protect sensitive data. Lockdown Mode is an advanced security setting for high-risk users that constrains ChatGPT's interaction with external systems and is available for enterprise plans with planned consumer rollout.

0 favorites 0 likes

#prompt-injection

Keeping your data safe when an AI agent clicks a link

OpenAI Blog ↗ · 2026-01-28 Cached

OpenAI describes security safeguards against URL-based data exfiltration attacks when AI agents retrieve web content, using an independent web index to verify that URLs are publicly known before automatic retrieval to prevent prompt injection attacks from leaking sensitive user data.

0 favorites 0 likes

#prompt-injection

Continuously hardening ChatGPT Atlas against prompt injection

OpenAI Blog ↗ · 2025-12-22 Cached

OpenAI announces security hardening of ChatGPT Atlas against prompt injection attacks through adversarial training and strengthened safeguards, including a rapid response loop for discovering and mitigating novel attack strategies before they appear in the wild.

0 favorites 0 likes

#prompt-injection

Understanding prompt injections: a frontier security challenge

OpenAI Blog ↗ · 2025-11-07 Cached

OpenAI publishes guidance on prompt injection attacks, a social engineering vulnerability where malicious instructions hidden in web content or documents can trick AI models into unintended actions. The company outlines its multi-layered defense strategy including instruction hierarchy research, automated red-teaming, and AI-powered monitoring systems.

0 favorites 0 likes

#prompt-injection

OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform

Papers with Code Trending ↗ · 2025-10-22 Cached

OpenGuardrails is an open-source platform for AI safety, offering context-aware content-safety and manipulation detection (e.g., prompt injection, jailbreaking) via a unified model, plus a separate NER pipeline for data-leakage identification. It achieves state-of-the-art performance on safety benchmarks and supports private, enterprise-grade deployment.

0 favorites 0 likes

#prompt-injection

Advancing Gemini's security safeguards

Google DeepMind Blog ↗ · 2025-05-20 Cached

Google DeepMind announces advanced security improvements for Gemini to defend against indirect prompt injection attacks through model hardening, adaptive evaluation, and layered defense mechanisms. The approach combines fine-tuning on adversarial scenarios with system-level guardrails to build inherent resilience while maintaining model performance.

0 favorites 0 likes

#prompt-injection

Empowering defenders through our Cybersecurity Grant Program

OpenAI Blog ↗ · 2024-06-20 Cached

OpenAI highlights grantees from its Cybersecurity Grant Program, supporting projects ranging from defending LLMs against prompt-injection attacks to autonomous cyber defense agents and secure AI inference infrastructure.

0 favorites 0 likes

#prompt-injection

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

OpenAI Blog ↗ · 2024-04-19 Cached

OpenAI proposes an instruction hierarchy approach to defend LLMs against prompt injection and jailbreak attacks by training models to prioritize system instructions over user inputs. The method significantly improves robustness without degrading standard capabilities.

0 favorites 0 likes

#prompt-injection

AI Agent Security - MIT 6.566 Computer Systems Security, Spring 2026

YouTube AI Channels ↗ · 2026-05-21 Cached

MIT 6.566 course lecture introduces security challenges for AI agents, including non-adversarial errors (e.g., accidental database deletion) and adversarial attacks (e.g., prompt injection, data leakage), and explains the basics of building systems from language models to conversational agents.

0 favorites 1 likes

#prompt-injection

Don't Switch to an AI Browser (Until You Watch This)

YouTube AI Channels ↗ · 2026-05-08 Cached

AI browsers like OpenAI's Atlas and Perplexity's Comet embed AI assistants directly into browsing with memory and agentic capabilities, but significant security risks from prompt injection attacks make them unsuitable for sensitive use.

0 favorites 0 likes

prompt-injection

Submit Feedback