risk-discovery

#risk-discovery

Safety Testing LLM Agents at Scale: From Risk Discovery to Evidence-Grounded Verification

arXiv cs.AI ↗ · 18h ago Cached

This paper presents Vera, an end-to-end automated safety testing framework for LLM agents that combines literature-driven risk discovery, combinatorial composition of safety cases, and evidence-grounded verification. Evaluations on four agent frameworks reveal substantial safety weaknesses, with average attack success rates reaching 93.9% under multi-channel attacks, and the release of Vera-Bench with 1600 executable safety cases.

0 favorites 0 likes

risk-discovery

Safety Testing LLM Agents at Scale: From Risk Discovery to Evidence-Grounded Verification

Submit Feedback