Tag
This paper proposes Absolute Thompson Sampling (ATS), a modification of Thompson Sampling that ensures optimism in expectation by using absolute exploration noise, enabling a simpler UCB-style regret analysis while maintaining computational efficiency. It achieves regret matching existing TS bounds, and introduces an ensemble variant that converges to UCB behavior.
This paper proposes a novel structured neuron pruning framework for deep neural networks using multi-armed bandit algorithms, demonstrating effectiveness on various tasks.
Purdue researchers introduce Mango, a multi-agent web navigation system that uses global site structure and Thompson Sampling to pick optimal starting URLs, boosting success rates on WebVoyager and WebWalkerQA benchmarks.