Tag
This paper provides high-probability guarantees for an unprojected linear TD(0) algorithm with Polyak–Ruppert averaging under Markovian sampling, using a single stepsize schedule that achieves both robust curvature-free and fast curvature-dependent convergence rates.