Tag
This paper proposes a performance-driven state abstraction method for reinforcement learning that directly optimizes decision quality, using a multi-timescale framework to jointly adapt the policy and a tree-structured abstraction. The algorithm refines or aggregates state space based on Q-value discrepancies, achieving better sample efficiency and faster replanning than baselines.
This paper studies the sample complexity of learning in average-reward weakly-coupled MDPs and restless bandits, establishing finite-sample PAC guarantees with polynomial complexity using a novel Lyapunov-based analysis framework.
This paper introduces Bellman-Taylor Score Decoding, a method to handle state-dependent feasible action sets in Markov decision processes, addressing a key challenge in applying deep reinforcement learning to operations research problems.
This paper formalizes exact unlearning in reinforcement learning, proposing a ρ-TV-stable RL algorithm for tabular MDPs that efficiently removes a user's data influence at a fraction of retraining cost, achieving near-minimax-optimal regret bounds. The work is accepted at ICML and establishes both upper and lower bounds for ρ-TV-stable RL algorithms.
This paper presents an Answer Set Programming (ASP) based implementation of the CARCASS framework for constructing abstractions in reinforcement learning, demonstrating its effectiveness on Blocks World and Minigrid domains.
This paper proposes a quantile Bayesian risk-aware MDP framework for online RL that adaptively balances robustness and exploration over time, providing theoretical regret bounds and demonstrating strong empirical performance.
The article promotes a Stanford lecture on Markov Decision Processes as a valuable resource for understanding the mathematical foundations of systematic trading, claiming it offers more insight than a short-term internship at major financial firms.