Tag
This paper presents Vera, an end-to-end automated safety testing framework for LLM agents that combines literature-driven risk discovery, combinatorial composition of safety cases, and evidence-grounded verification. Evaluations on four agent frameworks reveal substantial safety weaknesses, with average attack success rates reaching 93.9% under multi-channel attacks, and the release of Vera-Bench with 1600 executable safety cases.