Tag
This paper presents a unified algorithmic framework for distributed online submodular maximization under partition matroid constraints, achieving sublinear (1-1/e)-regret guarantees for both full-information and bandit feedback. It also introduces a bounded stochastic pipage rounding scheme to ensure cumulative sampling violations remain sublinear.
ALSO introduces a framework for online strategy optimization in multi-agent social simulation, formulating multi-turn interaction as an adversarial bandit problem and using a neural surrogate for reward prediction. Experiments on the Sotopia benchmark show it outperforms static baselines and existing optimization methods.