Tag
This paper addresses an open problem in reinforcement learning by providing a counterexample showing that differential temporal difference learning can diverge when using a global clock, despite converging with a local clock, in average-reward settings.