Tag
Proposes algorithms for contextual slate bandits with generalized linear rewards under limited adaptivity, achieving regret bounds independent of the non-linearity parameter. The batched and rarely-switching algorithms are computationally efficient and empirically outperform baselines, including in a language model example selection task.
This paper update presents a universal sequence preconditioning method achieving dimension-free regret bounds for marginally stable linear dynamical systems, using second-order VAW algorithm and Faber polynomials.
Proposes GraphDR-LinUCB, a method for contextual bandits with graph-structured arms that projects features onto the graph's low-frequency spectral subspace. Achieves the first regret bound for spectral-projection-based contextual bandits and demonstrates 15x regret reduction on real datasets over full-dimensional LinUCB.
This paper formalizes exact unlearning in reinforcement learning, proposing a ρ-TV-stable RL algorithm for tabular MDPs that efficiently removes a user's data influence at a fraction of retraining cost, achieving near-minimax-optimal regret bounds. The work is accepted at ICML and establishes both upper and lower bounds for ρ-TV-stable RL algorithms.
This paper introduces a curvature-adaptive Follow-the-Perturbed-Leader (FTPL) algorithm for online optimization that achieves optimal regret bounds for both non-convex Lipschitz losses and strongly convex losses, using a time-varying perturbation scale.
Introduces AdaWeather, an adaptive framework that combines multiple probabilistic weather forecasts using machine learning and mixture of experts, achieving logarithmic regret compared to the best static mixture of experts and showing empirical improvements in temperature forecasting.
This paper solves a COLT open problem by providing an optimal gap-dependent regret algorithm for private stochastic decision-theoretic online learning, achieving the lower bound of order (log K)/Δ_min + (log K)/ε.
This paper proves that online gradient descent achieves optimal √T regret for hidden-convex losses under a Hessian compatibility condition, resolving open questions in adversarial online learning. It also extends results to one-point bandit feedback with a T^{3/4} expected regret bound.
This paper proposes a quantile Bayesian risk-aware MDP framework for online RL that adaptively balances robustness and exploration over time, providing theoretical regret bounds and demonstrating strong empirical performance.
This paper studies the problem of learning to make optimal decisions with AI assistance under human-alignment, showing that alignment can reduce the complexity of learning, and provides regret bounds.