regret-bounds

#regret-bounds

Contextual Slate GLM Bandits with Limited Adaptivity

arXiv cs.LG ↗ · 10h ago Cached

Proposes algorithms for contextual slate bandits with generalized linear rewards under limited adaptivity, achieving regret bounds independent of the non-linearity parameter. The batched and rarely-switching algorithms are computationally efficient and empirically outperform baselines, including in a language model example selection task.

0 favorites 0 likes

#regret-bounds

@HazanPrinceton: Just in time for our tutorial at ICML next week, Annie posted an update to our universal sequence preconditioning paper…

X AI KOLs Timeline ↗ · yesterday Cached

This paper update presents a universal sequence preconditioning method achieving dimension-free regret bounds for marginally stable linear dynamical systems, using second-order VAW algorithm and Faber polynomials.

0 favorites 0 likes

#regret-bounds

Graph Dimensionality Reduction for Contextual Bandits: Structure-Specific Regret Bounds under Approximate Smoothness and Noisy Eigenspaces

arXiv cs.LG ↗ · 2d ago Cached

Proposes GraphDR-LinUCB, a method for contextual bandits with graph-structured arms that projects features onto the graph's low-frequency spectral subspace. Achieves the first regret bound for spectral-projection-based contextual bandits and demonstrates 15x regret reduction on real datasets over full-dimensional LinUCB.

0 favorites 0 likes

#regret-bounds

Exact Unlearning in Reinforcement Learning

arXiv cs.LG ↗ · 2026-06-04 Cached

This paper formalizes exact unlearning in reinforcement learning, proposing a ρ-TV-stable RL algorithm for tabular MDPs that efficiently removes a user's data influence at a fraction of retraining cost, achieving near-minimax-optimal regret bounds. The work is accepted at ICML and establishes both upper and lower bounds for ρ-TV-stable RL algorithms.

0 favorites 0 likes

#regret-bounds

From Non-Convex to Strongly Convex: Curvature-Adaptive FTPL for Online Optimization

arXiv cs.LG ↗ · 2026-06-03 Cached

This paper introduces a curvature-adaptive Follow-the-Perturbed-Leader (FTPL) algorithm for online optimization that achieves optimal regret bounds for both non-convex Lipschitz losses and strongly convex losses, using a time-varying perturbation scale.

0 favorites 0 likes

#regret-bounds

AdaWeather: Adaptively Mixing Probabilistic Weather Forecasts with Logarithmic Regret

arXiv cs.LG ↗ · 2026-06-03 Cached

Introduces AdaWeather, an adaptive framework that combines multiple probabilistic weather forecasts using machine learning and mixture of experts, achieving logarithmic regret compared to the best static mixture of experts and showing empirical improvements in temperature forecasting.

0 favorites 0 likes

#regret-bounds

Optimal Gap-Dependent Regret for Private Stochastic Decision-Theoretic Online Learning

arXiv cs.LG ↗ · 2026-05-29 Cached

This paper solves a COLT open problem by providing an optimal gap-dependent regret algorithm for private stochastic decision-theoretic online learning, achieving the lower bound of order (log K)/Δ_min + (log K)/ε.

0 favorites 0 likes

#regret-bounds

Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback

arXiv cs.LG ↗ · 2026-05-27 Cached

This paper proves that online gradient descent achieves optimal √T regret for hidden-convex losses under a Hessian compatibility condition, resolving open questions in adversarial online learning. It also extends results to one-point bandit feedback with a T^{3/4} expected regret bound.

0 favorites 0 likes

#regret-bounds

Evolving Robustness--Exploration Trade-off in Online Reinforcement Learning via Quantile Bayesian Risk MDPs

arXiv cs.LG ↗ · 2026-05-26 Cached

This paper proposes a quantile Bayesian risk-aware MDP framework for online RL that adaptively balances robustness and exploration over time, providing theoretical regret bounds and demonstrating strong empirical performance.

0 favorites 0 likes

#regret-bounds

Learning to Decide with AI Assistance under Human-Alignment

arXiv cs.LG ↗ · 2026-05-14 Cached

This paper studies the problem of learning to make optimal decisions with AI assistance under human-alignment, showing that alignment can reduce the complexity of learning, and provides regret bounds.

0 favorites 0 likes

regret-bounds

Submit Feedback