Tag
This paper studies when end-to-end reinforcement learning training improves multi-agent LLM workflows, comparing shared-policy and isolated-policy training across different workflows, tasks, and model scales, revealing conditional tradeoffs.
The author shares their experience building an autonomous AI research agent for pre-meeting paraplanning tasks using Claude Opus 4, but faces challenges extending it to post-meeting document generation due to compliance and template issues. They seek advice on whether the two phases should remain separate and how to bridge them in regulated environments.
FlowCompile is a compiler for structured LLM workflows that performs compile-time exploration of configurations to balance accuracy and latency, achieving up to 6.4x speedup without retraining.