@QingQ77: Run multiple models simultaneously in pi, fuse their responses into one, get better results for less money. https://github.com/leblancfg/pi-fusion… pi-fusion is an extension of pi that adds a "parallel fan-out" workflow to your coding agent.

X AI KOLs Timeline Tools

Summary

pi-fusion is an extension of pi that improves performance at lower cost by parallel fan-out of multiple models and fusing results, supporting prompt rewriting and session archiving.

Run multiple models simultaneously in pi, fuse their responses into one, get better results for less money. https://github.com/leblancfg/pi-fusion… pi-fusion is an extension of pi that adds a "parallel fan-out" workflow to your coding agent. Before each question, it dispatches several worker models to think in parallel, finally aggregating their ideas into a single coherent reply. It also has a discovery agent that can preload project context, and prompts can be automatically rewritten. During execution, the terminal shows a real-time split view of what each worker model is doing; pressing Esc cancels and reverts to normal mode. Presets can be saved, sessions archived and restored, and prompts customized. This approach is grounded in research on multi-model parallel inference and, for some coding tasks, is indeed faster and cheaper than using a single strongest model.
Original Article
View Cached Full Text

Cached at: 06/23/26, 12:06 PM

Run multiple models simultaneously in pi, merge their responses into one, and get better results for less money. https://github.com/leblancfg/pi-fusion…

pi-fusion is an extension for pi that adds a “parallel fanout” flow to your coding agent. Before each question, it dispatches several worker models to think in parallel, then aggregates their reasoning into a single complete response. It also includes a discovery agent that can preload project context, and prompts can be automatically rewritten. While running, the terminal shows a real-time split view of each worker’s activity; press Esc to cancel and revert to normal mode. Presets can be saved, sessions can be archived and restored, and prompts are customizable. This approach is grounded in research on multi-model parallel inference, and for certain coding tasks it can be faster and cheaper than using a single top-tier model.


leblancfg/pi-fusion

Source: https://github.com/leblancfg/pi-fusion

pi-fusion

How to install

Install as a global pi package from npm:

pi install npm:@leblancfg/pi-fusion

Or install directly from the GitHub repository:

pi install git:github.com/leblancfg/pi-fusion

Open pi and turn it on from the settings pane:

/fusion

pi-fusion adds a planning fanout to pi. Before the normal pi turn starts, it runs an (optional) discovery agent, rewrites variations of the prompt into complementary angles, fans out to planner workers, then injects their notes back into the main thread, that acts as a synthesis step. Combining independent model responses has been shown to outscore the individual frontier models on many benchmarks. Because independent passes behave differently, the synthesis model can reuse the useful disagreement instead of betting everything on one path through the problem. In some cases, you can get better performance than frontier models with better price and latency.

N.B. OpenRouter has a hosted Fusion router (openrouter/fusion) that runs a multi-model panel and judge behind one API route. pi-fusion is similar. It runs local pi subprocesses against your working tree, and hands their notes to the synthesis model you already chose. You control all configuration of how this happens, including whether the planning subprocesses get all tools or only read-only tools.

%%{init: {"theme":"base","themeVariables":{"fontFamily":"Inter, system-ui, sans-serif","lineColor":"#0E481F","primaryColor":"#0E481F","primaryTextColor":"#EEF3EA","primaryBorderColor":"#0E481F"}}}%%
flowchart LR
    U([Your prompt]) --> D["Discovery (optional)"]
    U --> R["Prompt rewrite (optional)"]
    D --> W1[Worker #1]
    D --> W2[Worker #2]
    D --> W3[Worker #3]
    R --> W1
    R --> W2
    R --> W3
    W1 --> A["Synthesis (pi actor turn)"]
    W2 --> A
    W3 --> A
    D --> A
    A --> O([One final turn])

    classDef solid fill:#0E481F,stroke:#0E481F,color:#EEF3EA;
    classDef outline fill:#E7ECE6,stroke:#0E481F,color:#0E481F;
    classDef pill fill:#E3E2DC,stroke:#C7C7C0,color:#16301F;
    class U,O pill;
    class D,W1,W2,W3,A solid;
    class R outline;

Why this exists

Coding agents often make the first plausible plan they see. That is fine for chores. It gets sketchier when a task has hidden coupling. pi-fusion makes multiple model calls at inference time, merged back into one synthesized response. Other words in the literature mention compound inference systems, inference-time scaling, test-time compute, model panels, multi-agent deliberation, and Mixture-of-Agents.

Not all reasoning has to happen as one long serial chain inside the most expensive model. Some of it can run in parallel across slightly cheaper or dumber models, then get compressed into a single final turn.

OpenAI’s o1 write-up (https://openai.com/index/learning-to-reason-with-llms/) made the test-time-compute axis feel obvious: give a model more thinking budget, and it can do better. The next step in that direction is to ask the question: what if some of that budget is more calls, more samples, or more agents instead of one longer hidden chain?

A few useful breadcrumbs:

  • Berkeley BAIR’s “The Shift from Models to Compound AI Systems” (https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/) defines compound AI systems as systems that use multiple interacting components: model calls, retrievers, tools, or control logic.
  • Chen et al., “Are More LLM Calls All You Need?” (https://arxiv.org/abs/2403.02419), studies scaling laws for compound inference systems that aggregate multiple LM calls.
  • Snell et al., “Scaling LLM Test-Time Compute Optimally” (https://arxiv.org/abs/2408.03314), frames inference-time compute as its own scaling axis.
  • Brown et al., “Large Language Monkeys” (https://arxiv.org/abs/2407.21787), shows repeated sampling can amplify weaker models, sometimes cost-effectively.
  • Wang et al., “Mixture-of-Agents” (https://arxiv.org/abs/2406.04692), shows multiple LLM agents can improve final answer quality when their outputs are aggregated.

My own evals point in the same direction for a subset of coding tasks: parallel planner calls can be cheaper, faster wall-clock, and better than sending everything straight to the biggest model. Not always. The whole point of this repo is to make that claim easy to test instead of treating it like a vibes-based architectural diagram.

What you see

In TUI mode, a fused turn shows a live pane:

  1. Discovery loads shared context once.
  2. Workers appear as vertical splits, each with its own prompt angle.
  3. Synthesis starts after the planning bundle is ready.

Useful controls:

/fusion                                      open settings
Esc                                          cancel the fanout and fall back to a normal turn
1-9                                          focus one worker column
0 / Tab                                      return to split view
p                                            show or hide rewritten worker prompts

And a little one-character status bar that marks whether the next turn is armed or not.

When to use it

Good fit:

  • “Find the bug, but I am not sure where it lives.”
  • “Plan this refactor before touching files.”
  • “Review this unfamiliar area and suggest the smallest safe change.”
  • “Compare a few implementation paths before we commit to one.”

Bad fit:

  • Tiny edits where startup latency costs more than the task.
  • Prompts with images. The synthesis turn can see them; discovery and workers currently cannot.
  • Fully non-interactive runs where you need progress output on stdout. pi-fusion stays quiet there so it does not corrupt print/JSON output.

Configure it

Open the settings pane:

/fusion
RowWhat it changes
Next turnArms fusion for the next eligible user prompt, then turns off.
PresetsSaves the current pane settings, loads saved ones, or deletes.
WorkersSets worker count and opens per-worker model settings.
Agent toolsSwitches discovery/workers between all tools and read-only.
DiscoveryPicks the context-loading model and reasoning effort.
RewriteToggles prompt rewriting before worker fanout.
SynthesisPicks the synthesis model and reasoning effort.
Save and closePersists settings in the pi session.

Presets are user-defined snapshots of the settings pane. There are no built-in profiles, because those would go stale and hide assumptions. Save your own from /fusionPresets. Global presets live in ~/.pi/agent/fusion.json; project presets live in .pi/fusion.json and override global presets with the same name. See docs/presets.md for the full format and examples.

The status bar uses a compact union marker: ∪̸ means fusion is off, and means the next eligible turn is armed.

For local development, load the TypeScript entrypoint directly:

pi -e ./extensions/pi-fusion/index.ts

The published package uses the pi.extensions field in package.json; there is no separate index.json manifest.

CLI flags exist for repeatable starts:

pi --fusion-workers 4 \
   --fusion-discovery-model anthropic/claude-haiku-4-5 \
   --fusion-worker-model anthropic/claude-sonnet-4-5 \
   --fusion-synthesis-model openai/gpt-5.2-codex

Use current or omit a model flag to keep the main session model. Reasoning values are:

current, off, minimal, low, medium, high, xhigh

Prompt Customization

You can fully customize all the prompts used by pi-fusion. On first run, default prompts are automatically written to your global fusion.json file (~/.pi/agent/fusion.json). You can see and edit them there, or override them on a per-project basis.

Where prompts are stored

pi-fusion reads prompts from two locations:

ScopePathUse it for
Global~/.pi/agent/fusion.jsonDefault templates used across all projects.
Project.pi/fusion.jsonProject-specific templates to share with your team.

Project-level prompts override global prompts per field. pi-fusion searches upward from the current working directory for an existing .pi/fusion.json or .git directory, so launching pi from a subdirectory still finds repo-level config.

For example, a project file can override only worker while keeping your global discovery, rewrite, and actor templates.

JSON format

Add a "prompts" section at the top level of your fusion.json:

{
  "version": 1,
  "prompts": {
    "discovery": "...",
    "rewrite": "...",
    "worker": "...",
    "synthesis": "..."
  },
  "presets": {
    "cheap-planners": {
      "description": "Fast worker fanout, current model as synthesis",
      "settings": { ... }
    }
  }
}

Available Prompts & Placeholders

Each prompt supports simple {{placeholder}} templating. You can rearrange, rewrite, or completely re-format the instruction text, as long as you preserve the template tags you want to substitute.

1. Discovery Prompt (prompts.discovery)

This prompt guides the discovery agent to explore your codebase.

  • Placeholders:
    • {{cwd}}: Working directory of your project.
    • {{task}}: Your original prompt.
    • {{recentContext}}: Pre-formatted recent conversation history.
    • {{toolGuidance}}: Pre-formatted guidance for the selected planner tool mode.

2. Prompt Rewrite (prompts.rewrite)

This prompt is used to ask the rewrite model to generate worker prompts.

  • Placeholders:
    • {{workerCount}}: The number of parallel workers.
    • {{task}}: Your original prompt.
    • {{recentContext}}: Pre-formatted recent conversation history.

3. Worker Prompt (prompts.worker)

This prompt runs on each parallel worker.

  • Placeholders:
    • {{cwd}}: Working directory of your project.
    • {{task}}: Your original prompt.
    • {{assignedPrompt}}: The rewritten prompt variation generated for this worker.
    • {{discoveryContext}}: Context loaded and handed off by the discovery agent.
    • {{workerName}}: Slot index/name (e.g. #1, #2).
    • {{discoveryGuidance}}: Pre-formatted guidance on how to use the discovery context.
    • {{toolGuidance}}: Pre-formatted guidance for the selected planner tool mode.
    • {{recentContext}}: Pre-formatted recent conversation history.

4. Synthesis Prompt (prompts.synthesis)

This prompt formats the final planning bundle injected into the synthesis turn.

  • Placeholders:
    • {{task}}: Your original prompt.
    • {{discoveryContext}}: Context loaded by the discovery agent.
    • {{variations}}: List of worker prompt variations.
    • {{workerOutputs}}: Outputs and plans produced by each worker.
    • {{imageNote}}: A note telling the synthesis step that workers did not see attached images (if any).

💡 Important: The synthesis prompt template should contain <fusion_done/> so that subsequent conversation turns know a fused turn has finished and bypass fusion automatically. If a custom synthesis prompt omits it, pi-fusion prepends the marker defensively.

Commands

/fusion                                            # open floating settings pane
/fusion status
/fusion on                                         # arm fusion for the next eligible user prompt
/fusion off
/fusion preset list
/fusion preset save cheap-planners
/fusion preset save-project repo-review
/fusion preset cheap-planners
/fusion workers 4
/fusion tools all
/fusion tools read-only
/fusion discovery-model anthropic/claude-haiku-4-5
/fusion discovery-model current
/fusion discovery-thinking low
/fusion discovery-thinking current
/fusion worker-model google/gemini-3.5-flash
/fusion worker-model current
/fusion worker-thinking medium
/fusion worker-thinking current
/fusion synthesis-model openai/gpt-5.5
/fusion synthesis-model current
/fusion synthesis-thinking high
/fusion synthesis-thinking current
/fusion output 12000
/fusion context 16000
/fusion resume 8000
/fusion timeout 600000
/fusion-transcript                                 # view the latest run's full archived transcript
/fusion-transcript list                            # list archived runs in this session
/fusion-transcript <run-id>                        # view a specific run
/fusion-transcript --write transcript.md           # export to a file

/fusion model ... is still accepted as an alias for /fusion worker-model ....

Startup flags

pi --fusion-enabled
pi --fusion-disabled
pi --fusion-preset cheap-planners
pi --fusion-workers 3
pi --fusion-planner-tools all
pi --fusion-discovery-model anthropic/claude-haiku-4-5
pi --fusion-discovery-thinking low
pi --fusion-worker-model google/gemini-3.5-flash
pi --fusion-worker-thinking medium
pi --fusion-synthesis-model openai/gpt-5.5
pi --fusion-synthesis-thinking high
pi --fusion-output-bytes 12000
pi --fusion-context-bytes 16000
pi --fusion-resume-bytes 8000
pi --fusion-timeout-ms 600000

Fusion is off by default. Use --fusion-enabled to start with the next eligible turn armed; --fusion-disabled forces it off. After a fused turn starts, pi-fusion automatically disarms itself.

--fusion-model remains as a backwards-compatible alias for --fusion-worker-model. Use --fusion-preset NAME to load a preset from ~/.pi/agent/fusion.json or .pi/fusion.json at startup.

Planner subprocesses get all tools by default; use /fusion tools read-only or --fusion-planner-tools read-only to restore the original narrow read/search/list tool set.

What gets sent where

When fusion is armed, the next idle, non-command user input consumes that arm and:

  • opens a live discovery pane in TUI mode;
  • runs query rewriting in parallel with discovery;
  • replaces discovery with live worker splits after discovery finishes;
  • starts standalone pi subprocesses in JSON print mode;
  • keeps your other extensions enabled in those subprocesses; only pi-fusion itself opts out (via the PI_FUSION_SUBAGENT env var) so workers can use your installed extensions without recursive fusion;
  • gives discovery and workers either all normal tools (default) or only read/search/list tools (read, grep, find, ls);
  • gives query rewriting no tools;
  • injects shared discovery context into every worker prompt;
  • asks workers for concise planning markdown;
  • inserts the final planning bundle into the synthesis turn’s system prompt via before_agent_start.

The user’s message stays untouched in the session. /tree and /fork still show the original prompt, and the planning bundle does not accumulate across turns. Fusion then returns to off automatically, so the following prompt runs normally unless you arm it again.

Session archive & resume

Everything happens inside a single pi session file, so a fused turn stays auditable and resumable — which matters for production and long-running workloads.

Each fused turn writes two things into the session tree:

  • A full archive. The complete, untruncated discovery, rewrite, and worker transcripts are saved as pi-fusion-archive custom entries (chunked only for storage, never semantically truncated). pi’s context builder never feeds custom entries to the model, so the archive is full-fidelity without ever inflating the context window. Each run gets a sortable run-id.
  • A bounded handoff. A single custom_message carries a budgeted summary of the worker conclusions plus a pointer to the archive run-id. This is the only fusion content that resumed and subsequent turns actually see, capped by fusion-resume-bytes (default 8000).

The result: when you resume a session, the model gets a useful, compact handoff instead of the raw sub-agent dumps, but the full transcripts are still on disk for audit, display, or export. Inspect them any time with /fusion-transcript (optionally --write <path> to export). Because the archive lives in custom entries, it survives resume but is filtered out of normal /tree conversation flow.

The synthesis turn itself still sees a larger slice of worker output (

Similar Articles

@oragnes: Recently discovered a hardcore open-source project from Harness: pi (recently moved under earendil-works from badlogic). It is an all-in-one AI Agent infrastructure suite plus a terminal programming assistant CLI designed to backstop developers. Stop reinventing the wheel: it provides a ready-made…

X AI KOLs Timeline

Pi is an open-source AI Agent infrastructure suite and terminal programming assistant CLI. It offers a unified API to bridge differences between multiple models, supports concurrent tool calling to reduce latency, and allows developers to control the thinking budget.

@NFTCPS: After trying it out, flipping through sessions and switching branches in the terminal is a real pain. The pi coding agent released a web version, pi-web, where you can do these things directly in the browser: view session list, grouped by working directory, chat with the agent in real-time, switch models mid-conversation, fork from any message, or rollback…

X AI KOLs Timeline

The pi coding agent has launched a web version, pi-web, offering session management, real-time conversation, branch navigation, and more, which can be quickly run via npx.

@WEB3_furture: COOL! Someone took the newly released Qwen 3.7-Max, Claude Opus 4.7, and GPT-5.5 for an Agent loop comparison: letting the model write its own Tetris bot, test it, and directly PK after 10 consecutive iterations. Results: Qwen 3.7-Max: +$…

X AI KOLs Timeline

Someone conducted an Agent loop comparison test on Qwen 3.7-Max, Claude Opus 4.7, and GPT-5.5, letting the models write their own Tetris bots and iterate 10 rounds before competing. The results show that Qwen 3.7-Max leads in both performance and cost.