finite-sample-guarantees

#finite-sample-guarantees

Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints

arXiv cs.AI ↗ · 2026-05-20 Cached

This paper formalizes workflow learning in multi-agent LLM pipelines as an interface-constrained semi-Markov decision process (IC-SMDP) and proposes IC-ICQQ, an asynchronous decentralized Q-learning algorithm with a finite-sample bound that decomposes error sources, providing the first finite-sample guarantee for neural Q-learning under decentralized partial observability.

0 favorites 0 likes

finite-sample-guarantees

Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints

Submit Feedback