Tag
This paper introduces the Markov decision contest, a new problem model for reinforcement learning with pairwise preferences. It proves optimality guarantees for stationary policies, exact solvability in P, and presents a learning-efficient approximate algorithm.