Tag
This paper formalizes workflow learning in multi-agent LLM pipelines as an interface-constrained semi-Markov decision process (IC-SMDP) and proposes IC-ICQQ, an asynchronous decentralized Q-learning algorithm with a finite-sample bound that decomposes error sources, providing the first finite-sample guarantee for neural Q-learning under decentralized partial observability.