pairwise-preferences

Tag

Cards List
#pairwise-preferences

Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems

arXiv cs.LG · 6d ago Cached

This paper introduces the Markov decision contest, a new problem model for reinforcement learning with pairwise preferences. It proves optimality guarantees for stationary policies, exact solvability in P, and presents a learning-efficient approximate algorithm.

0 favorites 0 likes
← Back to home

Submit Feedback