Tag
This paper introduces Bellman-Taylor Score Decoding, a method to handle state-dependent feasible action sets in Markov decision processes, addressing a key challenge in applying deep reinforcement learning to operations research problems.
This paper formalizes exact unlearning in reinforcement learning, proposing a ρ-TV-stable RL algorithm for tabular MDPs that efficiently removes a user's data influence at a fraction of retraining cost, achieving near-minimax-optimal regret bounds. The work is accepted at ICML and establishes both upper and lower bounds for ρ-TV-stable RL algorithms.
This paper presents an Answer Set Programming (ASP) based implementation of the CARCASS framework for constructing abstractions in reinforcement learning, demonstrating its effectiveness on Blocks World and Minigrid domains.
This paper proposes a quantile Bayesian risk-aware MDP framework for online RL that adaptively balances robustness and exploration over time, providing theoretical regret bounds and demonstrating strong empirical performance.
The article promotes a Stanford lecture on Markov Decision Processes as a valuable resource for understanding the mathematical foundations of systematic trading, claiming it offers more insight than a short-term internship at major financial firms.