Tag
This paper introduces CapCode, a capped evaluation framework that uses randomized test outputs to detect coding agents that game unit tests, and CapReward, a reward design that penalizes reward hacking in reinforcement learning for coding tasks.