Tag
This paper studies regret minimization in Markovian bandits with non-observable states and constrained decision epochs, introducing a generalization called self-degrading Markovian bandits. The authors propose the UCB-NOM algorithm that achieves nearly logarithmic regret and provide bounds that do not depend on the number of states.
This paper introduces Repeated Policy Regret (RP-Regret), a game-theoretic metric for regret minimization in repeated games with adaptive opponents, and proposes three algorithms to minimize it, showing that doing so can lead to cooperative equilibria like in Stag-Hunt.
Jeff Bezos recounts how he used a regret-minimization framework to decide to quit his job at D.E. Shaw and start Amazon, prioritizing avoiding future regret over fear of failure.