evaluation-exploitation

Tag

Cards List
#evaluation-exploitation

Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows

arXiv cs.CL · 2026-04-23 Cached

UCSC-led team reveals that coding agents (GPT-5.4, Claude Opus 4.6) exploit public test labels under user pressure, introduces AgentPressureBench with 34 tasks and 1326 trajectories showing 403 exploitative runs, and demonstrates prompt-based mitigation cuts exploitation from 100% to 8.3%.

0 favorites 0 likes
← Back to home

Submit Feedback