FlowCompile: An Optimizing Compiler for Structured LLM Workflows

Hugging Face Daily Papers Papers

Summary

FlowCompile is a compiler for structured LLM workflows that performs compile-time exploration of configurations to balance accuracy and latency, achieving up to 6.4x speedup without retraining.

Structured LLM workflows, where specialized LLM sub-agents execute according to a predefined graph, have become a powerful abstraction for solving complex tasks. Optimizing such workflows, i.e., selecting configurations for each sub-agent to balance accuracy and latency, is challenging due to the combinatorial design space over model choices, reasoning budgets, and workflow structures. Existing cost-aware methods largely treat workflow optimization as a routing problem, selecting a configuration at inference time for each query according to the accuracy-latency objective used during training. We argue that structured LLM workflows can also be optimized from a compilation perspective: before deployment, the system can globally explore the workflow design space and construct a reusable set of workflow-level configurations spanning diverse accuracy-latency trade-offs. Drawing inspiration from machine learning compilers, we introduce FlowCompile, a structured LLM workflow compiler that performs compile-time design space exploration to identify a high-quality, reusable trade-off set. FlowCompile decomposes a workflow into sub-agents, profiles each sub-agent under diverse configurations, and composes these measurements through a structure-aware proxy to estimate workflow-level accuracy and latency. It then identifies diverse high-quality configurations in a single compile-time pass, without retraining or online adaptation. Experiments across diverse workflows and challenging benchmarks show that FlowCompile consistently outperforms heuristically optimized workflow configurations and routing-based baselines, delivering up to 6.4x speedup. The compiled configuration set further serves as a reusable optimization artifact, enabling flexible deployment under varying runtime preferences and supporting downstream selection or routing.
Original Article
View Cached Full Text

Cached at: 05/15/26, 12:21 AM

Paper page - FlowCompile: An Optimizing Compiler for Structured LLM Workflows

Source: https://huggingface.co/papers/2605.13647

Abstract

FlowCompile is a structured LLM workflow compiler that optimizes complex multi-agent tasks by performing compile-time exploration of workflow configurations to balance accuracy and latency without retraining.

Structured LLM workflows, where specialized LLMsub-agentsexecute according to a predefined graph, have become a powerful abstraction for solving complex tasks. Optimizing such workflows, i.e., selecting configurations for each sub-agent to balance accuracy and latency, is challenging due to the combinatorial design space over model choices, reasoning budgets, and workflow structures. Existing cost-aware methods largely treatworkflow optimizationas a routing problem, selecting a configuration at inference time for each query according to the accuracy-latency objective used during training. We argue thatstructured LLM workflowscan also be optimized from a compilation perspective: before deployment, the system can globally explore the workflow design space and construct a reusable set ofworkflow-level configurationsspanning diverseaccuracy-latency trade-offs. Drawing inspiration frommachine learning compilers, we introduce FlowCompile, a structured LLM workflow compiler that performscompile-time design space explorationto identify a high-quality, reusable trade-off set. FlowCompile decomposes a workflow intosub-agents, profiles each sub-agent under diverse configurations, and composes these measurements through astructure-aware proxyto estimate workflow-level accuracy and latency. It then identifies diverse high-quality configurations in a single compile-time pass, without retraining or online adaptation. Experiments across diverse workflows and challenging benchmarks show that FlowCompile consistently outperforms heuristically optimized workflow configurations and routing-based baselines, delivering up to 6.4x speedup. The compiled configuration set further serves as a reusable optimization artifact, enabling flexible deployment under varyingruntime preferencesand supporting downstream selection or routing.

View arXiv pageView PDFGitHub0Add to collection

Get this paper in your agent:

hf papers read 2605\.13647

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.13647 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.13647 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.13647 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles