Tag
This paper formalizes model exploitation in reinforcement learning, proving it is unavoidable in large policy sets, and establishes a theoretical bridge between reward hacking and model exploitation.