bandits

#bandits

Learning in Markovian bandits with non-observable states and constrained decision epochs

arXiv cs.LG ↗ · 3d ago Cached

This paper studies regret minimization in Markovian bandits with non-observable states and constrained decision epochs, introducing a generalization called self-degrading Markovian bandits. The authors propose the UCB-NOM algorithm that achieves nearly logarithmic regret and provide bounds that do not depend on the number of states.

0 favorites 0 likes

#bandits

Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

arXiv cs.LG ↗ · 2026-05-21 Cached

This paper studies piecewise-stationary low-rank linear contextual bandits, proposes the SPSC algorithm that achieves dynamic regret scaling with the intrinsic rank instead of the ambient dimension, and characterizes the identification boundary for subspace recovery under scalar feedback.

0 favorites 0 likes

#bandits

Not all uncertainty is alike: volatility, stochasticity, and exploration

arXiv cs.AI ↗ · 2026-05-20 Cached

This paper demonstrates that volatility and stochasticity, both sources of uncertainty, drive optimal exploration in opposite directions: volatility increases exploration while stochasticity suppresses it. The authors extend the Gittins index framework to Gaussian state-space bandits and introduce CAUSE, a closed-form exploration bonus that outperforms standard strategies.

0 favorites 0 likes

#bandits

Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback

arXiv cs.AI ↗ · 2026-05-08 Cached

This paper introduces a hybrid Track-and-Stop algorithm for best arm identification in generalized linear bandits that unifies absolute and relative feedback. The authors propose a likelihood-ratio-based confidence sequence to adaptively allocate queries, demonstrating improved sample efficiency over baseline methods.

0 favorites 0 likes

bandits

Learning in Markovian bandits with non-observable states and constrained decision epochs

Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

Not all uncertainty is alike: volatility, stochasticity, and exploration

Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback

Submit Feedback