constrained-decision-epochs

#constrained-decision-epochs

Learning in Markovian bandits with non-observable states and constrained decision epochs

arXiv cs.LG ↗ · 3d ago Cached

This paper studies regret minimization in Markovian bandits with non-observable states and constrained decision epochs, introducing a generalization called self-degrading Markovian bandits. The authors propose the UCB-NOM algorithm that achieves nearly logarithmic regret and provide bounds that do not depend on the number of states.

0 favorites 0 likes

constrained-decision-epochs

Learning in Markovian bandits with non-observable states and constrained decision epochs

Submit Feedback