Agent Threat Rules: Open detection rule format for AI agent security threats

Reddit r/AI_Agents Tools

Summary

An open detection rule format for AI agent security threats, inspired by Sigma/YARA, aims to standardize detection of prompt injection, tool abuse, and other agent attacks, though it notes limitations against semantic attacks.

Came across this open detection rule format for AI agent security threats. The interesting part is that it treats agent threats more like Sigma/YARA style detection: YAML rules for things like prompt injection, tool-call arguments, SKILL.md content, agent manipulation, skill compromise, and context exfiltration. That feels like a useful direction because agent security is still very scattered right now. Everyone talks about prompt injection and tool abuse, but there is not much shared language for detection rules, test cases, and repeatable coverage. The article also notes a big limitation: regex/rule-based detection can catch structured patterns, but paraphrased or semantic attacks still slip through. So this is not a full answer by itself. Curious what people here think. Do agent systems need an open rule format like this, or will detection need to be mostly runtime/context based instead of signature based?
Original Article

Similar Articles

Free AI Agent Security Assessment

Reddit r/AI_Agents

Antitech is offering free early-access security assessments for AI agents, testing against attack vectors like prompt injection, tool abuse, and data leakage, providing a vulnerability report and discounts for participants.

Agent Trace RFC

Lobsters Hottest

Agent Trace is an open specification for tracking AI-generated code in version-controlled codebases, defining a vendor-neutral format to record AI contributions alongside human authorship.

Agent rules need to exist where the action happens

Reddit r/AI_Agents

The article argues that AI agent safety rules should be implemented as hard workflow constraints and permissions rather than relying solely on prompt instructions. It emphasizes the need for explicit checks, approvals, and logs for sensitive or irreversible actions.