Tag
This paper presents a safety evaluation framework for tool-using LLM agents, introducing the concept of the 'Verifier Tax'—a horizon-dependent tradeoff between safety and task completion. It proposes a two-tier verification architecture and uses Tau-bench scenarios to demonstrate how verification can reduce unsafe successes but also decrease task completion as task horizon increases.
This paper introduces the concept of 'constraint tax'—the accuracy loss caused by structured output constraints in small language models—and presents a measurement protocol to quantify the tradeoff between validity and correctness.