Tag
This paper presents counterexamples showing that Monte Carlo Exploring Starts can converge to suboptimal solutions in tabular reinforcement learning, and provides a modification that guarantees convergence to optimality by scaling learning rates inversely to update frequencies.
This paper derives exact closed-form expressions for gradients and test loss after one and two steps of gradient descent in two-layer and three-layer linear neural networks, characterizing optimal learning rate selection and revealing a distinct early-training regime where unequal layer-wise learning rates are initially optimal.