Tag
This paper introduces an online contextual Pandora's Box model for adaptively querying and selecting LLM APIs, proposing a learning approach that combines GMM estimation with UCB-style confidence bounds and proving dimension-dependent regret bounds.
The paper introduces Human-in-the-Loop Gated Bandit (HITL-GB) for short-term rental dynamic pricing, showing that historical pricing data under a prior policy is structurally equivalent to on-policy warm-up data, reducing cold-start from ~150 to ~30 episodes.
This paper studies piecewise-stationary low-rank linear contextual bandits, proposes the SPSC algorithm that achieves dynamic regret scaling with the intrinsic rank instead of the ambient dimension, and characterizes the identification boundary for subspace recovery under scalar feedback.