rubric-based-rewards

Tag

Cards List
#rubric-based-rewards

Reward Hacking in Rubric-Based Reinforcement Learning

Hugging Face Daily Papers · 2d ago Cached

This paper investigates reward hacking in rubric-based reinforcement learning, analyzing the divergence between training verifiers and evaluation metrics. It introduces a diagnostic for the 'self-internalization gap' and demonstrates that stronger verification reduces but does not eliminate reward hacking.

0 favorites 0 likes
← Back to home

Submit Feedback