pairwise-preferences

#pairwise-preferences

Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems

arXiv cs.LG ↗ · 6d ago Cached

This paper introduces the Markov decision contest, a new problem model for reinforcement learning with pairwise preferences. It proves optimality guarantees for stationary policies, exact solvability in P, and presents a learning-efficient approximate algorithm.

0 favorites 0 likes

pairwise-preferences

Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems

Submit Feedback