Tag
This paper studies when end-to-end reinforcement learning training improves multi-agent LLM workflows, comparing shared-policy and isolated-policy training across different workflows, tasks, and model scales, revealing conditional tradeoffs.