Tag
This paper reveals that Mirror Descent with non-quadratic regularizers can be exponentially more sensitive to initialization than Gradient Descent, even under well-conditioned settings, which has implications for reproducibility in RL and LLM post-training.