rubric-based-rewards

#rubric-based-rewards

Reward Hacking in Rubric-Based Reinforcement Learning

Hugging Face Daily Papers ↗ · 2d ago Cached

This paper investigates reward hacking in rubric-based reinforcement learning, analyzing the divergence between training verifiers and evaluation metrics. It introduces a diagnostic for the 'self-internalization gap' and demonstrates that stronger verification reduces but does not eliminate reward hacking.

0 favorites 0 likes

rubric-based-rewards

Reward Hacking in Rubric-Based Reinforcement Learning

Submit Feedback