@yibie: Recommend this article. The author of Superpowers ran a complete autoresearch loop with Fable 5 — 25 experiments, $165, improving build speed by 50% and reducing token costs by 60%. But the most valuable part of this article is not the result numbers; it's the complete record of the process…
Summary
Superpowers 6 is released, using Fable 5 to run 25 autonomous experiments, improving build speed by 50% and reducing token costs by 60%, with detailed records of the experimental process and lessons from failures.
View Cached Full Text
Cached at: 07/03/26, 02:38 PM
Recommend this article. The author of Superpowers ran a full autoresearch loop with Fable 5 — 25 experiments, $165, improving build speed by 50% and cutting token costs by 60%. But the most valuable part isn’t the final numbers — it’s the complete record of the experimental process: every failure, every idea that was “proven dead,” and three measurement bugs corrected along the way. This is the most comprehensive hands-on report on “autonomous R&D with Fable” available today.
Superpowers 6: Running 25 Autonomous Experiments with Fable 5, Cutting Costs by 60%
A week ago, we were gearing up to release Superpowers 5.2 — already delayed a few times to add “just one more improvement.” Then Anthropic shipped (and unshipped) Fable. In those few days, I pushed it to its limits.
The most common complaint from Superpowers users is that tokens are expensive and builds are slow. Slow shouldn’t be a problem — it happens during the autonomous subagent-driven orchestration of the build. But it is a problem. Slow isn’t fun. Expensive isn’t fun either.
When Fable came out, I decided to see how much it could optimize Subagent Driven Development. I was hoping for maybe a 15% reduction in token consumption. I got that — and a lot more.
First attack: the coordinator-to-reviewer handoff
Fable analyzed thousands of Subagent Driven Development sessions and found that code and spec compliance review subagents were running a lot of git commands during reviews. Replacing the written instructions for finding the commit to review with a shell script that pre-generates a review package containing a formatted diff and metadata reduced token consumption and wall-clock time by about 10%.
That night before bed, I told Fable: “See if you can cut another 15% in time and tokens while I’m asleep.” I also left a message on internal Slack: we should look at what happens when we merge the code reviewer and spec compliance reviewer.
I didn’t know what I expected. I certainly didn’t expect to wake up and find that Fable had independently reached the same conclusion, tested it, and found it saved that extra 15% on our eval suite.
Second night: the autonomous research loop
/goal once this is done, run an autoresearch loop to improve cost-efficiency of the superpowers build loop.
Use opus as coordinator. Build a hypothesis log. Run experiments. At least 25 experiments.
Fable built a complete autoresearch harness and ran all night. 25 experiments completed for $165.
Result: The shippable candidate (E27) — opus controller + elicited plan + conditional haiku implementer + terse reviewer contract + narration recipe + final review tier pin.
Wins with numbers: terse reviewer contract reduced reviewer output by 41%, verdict unchanged. Narration recipe reduced by 54%, zero variance. Conditional implementer tiering saved ~$0.5-1/run, and E22 proved it correctly refused haiku for prose plans.
Things now provably dead: capping controller thinking backfired — turns rose from 92 to 138, output doubled. Plan word budgets slashed test content by 62% even when code was exempted. Sonnet plan generation kept fidelity but destroyed task structure. Implementation bodies in plans are marginal — tests + interface + structure carried the entire load.
A risk finding worth remembering: reviewers given only the diff package made confident spec verdicts while silently redefining “spec” as global constraints — 0 out of 5 flagged the missing brief. Same failure family as the haiku reviewer advocacy.
Six leads closed as already optimal (report reads cache healthy, reviewer floor, haiku fixer, todo bookkeeping, dispatch re-derivation) — recorded so nobody re-buys those lessons.
Three measurement bugs of my own were caught and fixed mid-loop: a grep that counted template echoes as self-review catches, a harness that never inlined the diff, a scorer regex that missed newlines. One retracted verdict was re-measured clean — -74% became an honest -41%.
Results
Across 36 hours of work and about $650 in unsubsidized token spend: on the Anthropic eval benchmark, build wall-clock time down 50%, token spend down 60%. The biggest improvements came from merging the spec compliance and code quality review agents, pre-baking the review package so reviewers rarely need to run git, and changing the guidance we give the orchestrator about what kind of agent to use for what task.
Then we ran the eval on Codex — the results showed zero improvement. A few minutes of digging: the Codex evals weren’t isolated well enough and were always benchmarking Superpowers 5.1.0. Once fixed, all results held.
In a word
Superpowers 6 proves that autonomous agent R&D isn’t a demo — it’s happening. 25 experiments, $165, one overnight run. Every experiment had a pre-registered hypothesis. Every rejected idea was documented. Every measurement error was corrected mid-loop. The eval infrastructure allowed them to quantify changes across multiple harnesses. This is the right shape for autonomous R&D.
Original: Jesse Vincent (obra), “Superpowers 6”, 2026-06-15
https://blog.fsck.com/2026/06/15/Superpowers-6/…
#Fable5 #Agent #AutonomousR&D #Superpowers
Similar Articles
@iamai_omni: Fable 5 is basically ASI, its self-correction ability is astonishing.
User iamai_omni praises Fable 5's self-correction ability, considering it comparable to ASI. Citing a recommendation from yibie, they point out that the Superpowers author had Fable 5 run an autoresearch loop, spending $165 to complete 25 experiments, increasing build speed by 50% and reducing token overhead by 60%, and documented failures and correction processes in detail.
Superpowers 6
Superpowers 6大幅提升了开发速度和成本效率,通过Fable的优化实现最高50%更快构建和60%更低token消耗,同时改进了对多个AI模型和编码代理的支持。
@FinanceYF5: Oh my god... Fable 5 is back, and it's insanely powerful. Someone asked Fable to make a game called 'Super Smart Racing'... With just 4 prompts and $173 worth of tokens, Fable 5 created this game. (Prompts below)
Fable 5 model only used 4 prompts and $173 worth of tokens to create a game called 'Super Smart Racing', demonstrating its extremely strong generative capabilities.
@mylifcc: Using Fable 5 for guidance + GPT 5.5 for execution is the smartest and most cost-effective approach. I'm doing this right now and the results are excellent. As long as the documentation spec is well-designed, it doesn't matter who executes it, which maximizes Fable 5's cost-effectiveness. Core method: First, chat with Fable once and let it...
Sharing an efficient and cost-effective approach that uses Fable 5 for guidance and code review while GPT 5.5 executes, emphasizing maximizing cost-effectiveness through handoff documents.
@RookieRicardoR: Fable 5 Max, five tasks, 3300 lines of code, ran for 90 minutes, is this right?
Discusses the performance of Fable 5 Max (five tasks, 3300 lines of code, 90 minutes) and notes that with the latest version of Claude Code (170), Fable 5's cost is twice that of Ops 4.8.