Tag
This paper studies regret minimization in Markovian bandits with non-observable states and constrained decision epochs, introducing a generalization called self-degrading Markovian bandits. The authors propose the UCB-NOM algorithm that achieves nearly logarithmic regret and provide bounds that do not depend on the number of states.
This paper studies piecewise-stationary low-rank linear contextual bandits, proposes the SPSC algorithm that achieves dynamic regret scaling with the intrinsic rank instead of the ambient dimension, and characterizes the identification boundary for subspace recovery under scalar feedback.
This paper demonstrates that volatility and stochasticity, both sources of uncertainty, drive optimal exploration in opposite directions: volatility increases exploration while stochasticity suppresses it. The authors extend the Gittins index framework to Gaussian state-space bandits and introduce CAUSE, a closed-form exploration bonus that outperforms standard strategies.
This paper introduces a hybrid Track-and-Stop algorithm for best arm identification in generalized linear bandits that unifies absolute and relative feedback. The authors propose a likelihood-ratio-based confidence sequence to adaptively allocate queries, demonstrating improved sample efficiency over baseline methods.