retrying

Tag

Cards List
#retrying

Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

arXiv cs.LG · 2026-06-02 Cached

This paper introduces ReMax, a new objective for reinforcement learning that induces exploration as an emergent property by evaluating policies based on expected maximum return over multiple samples, without explicit exploration bonuses. The authors derive a policy gradient formulation and propose RePPO, a PPO variant that achieves efficient exploration on MinAtar and Craftax benchmarks.

0 favorites 0 likes
← Back to home

Submit Feedback