Tag
This paper presents counterexamples showing that Monte Carlo Exploring Starts can converge to suboptimal solutions in tabular reinforcement learning, and provides a modification that guarantees convergence to optimality by scaling learning rates inversely to update frequencies.