non-observable-states

#non-observable-states

Learning in Markovian bandits with non-observable states and constrained decision epochs

arXiv cs.LG ↗ · 3d ago Cached

This paper studies regret minimization in Markovian bandits with non-observable states and constrained decision epochs, introducing a generalization called self-degrading Markovian bandits. The authors propose the UCB-NOM algorithm that achieves nearly logarithmic regret and provide bounds that do not depend on the number of states.

0 favorites 0 likes

non-observable-states

Learning in Markovian bandits with non-observable states and constrained decision epochs

Submit Feedback