non-observable-states

Tag

Cards List
#non-observable-states

Learning in Markovian bandits with non-observable states and constrained decision epochs

arXiv cs.LG · 3d ago Cached

This paper studies regret minimization in Markovian bandits with non-observable states and constrained decision epochs, introducing a generalization called self-degrading Markovian bandits. The authors propose the UCB-NOM algorithm that achieves nearly logarithmic regret and provide bounds that do not depend on the number of states.

0 favorites 0 likes
← Back to home

Submit Feedback