Tag
This paper identifies limitations of conventional uncertainty estimates for deep reinforcement learning and proposes percentile-based statistics and visualization to better assess run-to-run performance variation. Case studies demonstrate the method on PPO, SAC, TD-MPC, DQN, and Rainbow algorithms.