benchmark-auditing

Tag

Cards List
#benchmark-auditing

Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

arXiv cs.AI · 7h ago Cached

This paper introduces BenchJack, an automated red-teaming system that systematically audits AI agent benchmarks by identifying reward-hacking exploits. It applies BenchJack to 10 popular benchmarks, surfacing 219 distinct flaws and demonstrating that evaluation pipelines lack an adversarial mindset, with the system reducing hackable-task ratios from near 100% to under 10% on four benchmarks.

0 favorites 0 likes
← Back to home

Submit Feedback