agent-safety

#agent-safety

What If?

Reddit r/AI_Agents ↗ · 11h ago

Introduces Sentinel Gateway, a security middleware designed to enforce strict scope and safety constraints on AI agents, preventing unauthorized actions like data deletion or leakage while ensuring full traceability.

0 favorites 0 likes

#agent-safety

ActionFence: A drop-in middleware for MCP servers to enforce spend caps and policy limits

Reddit r/AI_Agents ↗ · yesterday

ActionFence is an open-source middleware tool for enforcing security policies, such as spend caps and identity tiers, on MCP servers and Express APIs to protect against agent misuse.

0 favorites 0 likes

#agent-safety

MedSkillAudit: A Domain-Specific Audit Framework for Medical Research Agent Skills

Hugging Face Daily Papers ↗ · 2026-04-22 Cached

This paper introduces MedSkillAudit, a domain-specific framework for auditing the safety and quality of medical research AI agent skills before deployment. The study demonstrates that the system achieves reliable assessment consistency comparable to or better than human expert review.

0 favorites 0 likes

#agent-safety

CrabTrap: An LLM-as-a-judge HTTP proxy to secure agents in production

Hacker News Top ↗ · 2026-04-21 Cached

Brex open-sources CrabTrap, an LLM-as-a-judge HTTP proxy that filters and secures AI agent traffic before it reaches production services.

0 favorites 0 likes

#agent-safety

Designing AI agents to resist prompt injection

OpenAI Blog ↗ · 2026-03-11 Cached

OpenAI publishes guidance on designing AI agents resistant to prompt injection attacks, arguing that modern attacks increasingly use social engineering tactics rather than simple string injections, and advocating for system-level defenses that constrain impact rather than relying solely on input filtering.

0 favorites 0 likes

agent-safety

What If?

ActionFence: A drop-in middleware for MCP servers to enforce spend caps and policy limits

MedSkillAudit: A Domain-Specific Audit Framework for Medical Research Agent Skills

CrabTrap: An LLM-as-a-judge HTTP proxy to secure agents in production

Designing AI agents to resist prompt injection

Submit Feedback