FlowCompile: An Optimizing Compiler for Structured LLM Workflows
Summary
FlowCompile is a compiler for structured LLM workflows that performs compile-time exploration of configurations to balance accuracy and latency, achieving up to 6.4x speedup without retraining.
View Cached Full Text
Cached at: 05/15/26, 12:21 AM
Paper page - FlowCompile: An Optimizing Compiler for Structured LLM Workflows
Source: https://huggingface.co/papers/2605.13647
Abstract
FlowCompile is a structured LLM workflow compiler that optimizes complex multi-agent tasks by performing compile-time exploration of workflow configurations to balance accuracy and latency without retraining.
Structured LLM workflows, where specialized LLMsub-agentsexecute according to a predefined graph, have become a powerful abstraction for solving complex tasks. Optimizing such workflows, i.e., selecting configurations for each sub-agent to balance accuracy and latency, is challenging due to the combinatorial design space over model choices, reasoning budgets, and workflow structures. Existing cost-aware methods largely treatworkflow optimizationas a routing problem, selecting a configuration at inference time for each query according to the accuracy-latency objective used during training. We argue thatstructured LLM workflowscan also be optimized from a compilation perspective: before deployment, the system can globally explore the workflow design space and construct a reusable set ofworkflow-level configurationsspanning diverseaccuracy-latency trade-offs. Drawing inspiration frommachine learning compilers, we introduce FlowCompile, a structured LLM workflow compiler that performscompile-time design space explorationto identify a high-quality, reusable trade-off set. FlowCompile decomposes a workflow intosub-agents, profiles each sub-agent under diverse configurations, and composes these measurements through astructure-aware proxyto estimate workflow-level accuracy and latency. It then identifies diverse high-quality configurations in a single compile-time pass, without retraining or online adaptation. Experiments across diverse workflows and challenging benchmarks show that FlowCompile consistently outperforms heuristically optimized workflow configurations and routing-based baselines, delivering up to 6.4x speedup. The compiled configuration set further serves as a reusable optimization artifact, enabling flexible deployment under varyingruntime preferencesand supporting downstream selection or routing.
View arXiv pageView PDFGitHub0Add to collection
Get this paper in your agent:
hf papers read 2605\.13647
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.13647 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.13647 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.13647 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis
ReaComp compiles LLM reasoning traces into reusable symbolic program synthesizers that achieve strong accuracy on program synthesis benchmarks while eliminating LLM calls at test time, significantly reducing computational cost.
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
DataFlow is an LLM-driven framework for automated data preparation and workflow engineering, featuring nearly 200 reusable operators and six domain-general pipelines that improve LLM performance across tasks like math, code, and Text-to-SQL.
We stopped optimizing our LLM stack manually — it optimizes itself now
The article describes a company's transition to a self-optimizing LLM stack that uses production traces to automatically route requests and fine-tune models, resulting in significant cost reductions and performance improvements.
Testing Local LLMs in Practice: Code Generation, Quality vs. Speed
The author built a benchmark harness to evaluate local LLMs for autonomous Go code generation, focusing on log parser generation for SIEM pipelines, and published results comparing quality vs. speed.
@_vmlops: ANTHROPIC QUIETLY SHIPPED /workflows IN CLAUDE CODE and the principle behind it is what got me old pattern → one LLM or…
Anthropic quietly shipped /workflows in Claude Code, replacing LLM orchestrators with code-based control flow to avoid token tax and context sloppiness.