reward-tampering

Tag

Cards List
#reward-tampering

Modification-Considering Value Learning for Reward Hacking Mitigation in RL

arXiv cs.LG · 15h ago Cached

Proposes Modification-Considering Value Learning (MCVL), a safeguard for off-policy value-based RL that mitigates reward hacking by evaluating each transition's impact on a frozen bootstrapped-return estimator before admitting it into training.

0 favorites 0 likes
← Back to home

Submit Feedback