The attack on AI agents that no security tool catches

Reddit r/artificial 05/31/26, 04:34 PM Tools

Summary

An attacker can bypass security by spreading malicious instructions across multiple messages; Bendex Arc is a tool that tracks session behavior across turns to catch such attacks.

Been working on AI agent security for a while and the attack that concerns me most barely gets talked about. Not the obvious stuff like “ignore previous instructions.” Those get caught. The scary one is when an attacker spreads the attack across multiple messages. Each message looks totally normal. The model sees nothing suspicious. But by message 8 it’s doing something it absolutely should not be doing. Every security tool I’ve tested evaluates messages one at a time. None of them remember what happened three messages ago. Built Bendex Arc to catch this. It tracks session behavior across turns instead of evaluating each message in isolation. Try it at https://bendexgeometry.com or red team it at https://web-production-6e47f.up.railway.app/demo Curious if anyone building agents in production has actually hit this or tested against it.

Original Article

Similar Articles

Your AI agent just got hijacked. You have no idea it happened.

Reddit r/artificial

This article warns about the Crescendo attack, a multi-turn prompt injection that evades single-message defenses by poisoning an AI agent's context over several turns. It introduces Bendex Arc, a tool that tracks behavioral trajectory across sessions to catch such attacks before they execute.

The attack on AI agents that no security tool catches

Similar Articles

Your AI agent just got hijacked. You have no idea it happened.

I don’t think you can break Bendex Arc. Prove me wrong.

If your AI agent can send emails, browse websites, or call tools, I want to test something with you

Your AI agent is one poisoned webpage away from doing something catastrophic

Most AI security tools inspect messages. Arc Gate inspects sessions.

Submit Feedback