goodharts-law

Tag

Cards List
#goodharts-law

Signed Compression Progress on a Sealed Audit is Goodhart-Resistant

arXiv cs.LG · yesterday Cached

This paper formalizes the concept of signed compression progress on a sealed audit as a reward that is Goodhart-resistant, proving that cumulative reward telescopes to genuine audit improvement and providing bounds for finite audit panels. It identifies failure modes and validates results with experiments.

0 favorites 0 likes
#goodharts-law

Scaling laws for reward model overoptimization

OpenAI Blog · 2022-10-19 Cached

OpenAI researchers empirically study how reward model overoptimization affects performance, establishing scaling laws that show the relationship between proxy reward optimization and ground truth performance varies by optimization method and scales predictably with model size.

0 favorites 0 likes
#goodharts-law

Measuring Goodhart’s law

OpenAI Blog · 2022-04-13 Cached

OpenAI research formally analyzes Goodhart's law through best-of-n sampling, providing efficient estimators for measuring how well proxy objectives track true objectives and quantifying optimization effort via KL divergence.

0 favorites 0 likes
← Back to home

Submit Feedback