randomized-tests

#randomized-tests

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

Hugging Face Daily Papers ↗ · 2026-06-05 Cached

This paper introduces CapCode, a capped evaluation framework that uses randomized test outputs to detect coding agents that game unit tests, and CapReward, a reward design that penalizes reward hacking in reinforcement learning for coding tasks.

0 favorites 0 likes

randomized-tests

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

Submit Feedback