@eliebakouch: to be clear, this is a closed source orchestrator on top of closed source models. if before you didn't control the mode…
Summary
Elie Bakouch critiques Sakana AI's Fugu system as a closed-source orchestration layer over closed-source models, arguing it lacks transparency and true AI sovereignty, with technical limitations in routing and cost efficiency.
View Cached Full Text
Cached at: 06/23/26, 01:43 AM
to be clear, this is a closed source orchestrator on top of closed source models. if before you didn’t control the models, now you don’t even control which ones are used or how much. this is not “AI sovereignty”
i’ve also read the tech report to get an opinion on the technical stuff:
fugu (not the ultra version) is basically a classifier that selects which model at each turn is most likely to answer correctly (in other words a router). this leads to -10 points on SWE Bench pro compared to opus, gets some gains on other benchmarks but very slight. argument could be that it reduces cost, but no information about this so it’s likely the opposite. they also have an autoresearch benchmark where they compare to frontier models “Model A, B and C” which is really crazy to not be transparent about what models you compare against. let’s also say that this probably doesn’t support adding new llm out of the box since you need to retrain the classifier
about fugu ultra, this is basically and advanced plan mode and orchestrator, this is a model that for a query outputs a plan with multiple “workflows”. my understanding of workflows is that they say: “spawn model A subagents to achieve this, then use model B to judge it, then summarize this with model C” which is just a test time scaling compute strategy. i think this is an okish way to do it, but it’s limited by the fact that they need to predict everything before the agents start working, which is why they limit this to 5 steps. imo you need to predict what to spawn at t+1 with the information you get at t, not with the info you get at t=0. there are also other issues such as fable 5 score on terminal bench being wrong and them being super vague and unclear about which model is in the LLM pool (they only mention closed source api one)
the biggest and most obvious issue is that they are introducing a “test time scaling” method with “best of N” over models, and they literally NEVER REPORT the number of output tokens or cost to achieve a benchmark/task
the good comparison here is not with opus, but it’s opus with ultracode/workflows enable, not with kimi, but with kimi swarm ect.. very very confusing release
Sakana AI (@SakanaAILabs): Introducing Sakana Fugu: A full multi-agent orchestration system accessible via a single model API.
Our ‘Fugu Ultra’ model matches the performance of Fable and Mythos, delivering frontier capability without the risk of export controls.
Try it: https://t.co/aDEFyySWlS 🐡
Similar Articles
@amitiitbhu: https://x.com/amitiitbhu/status/2069023290182758497
A detailed blog post explaining the Sakana Fugu technical report, which introduces orchestrator AI models that route tasks to specialized models, achieving collective intelligence.
@sashimikun_void: @serenaa_ge Deepswe benchmark pls
Sakana AI announced Sakana Fugu, a multi-agent orchestration system accessible via a single model API, with the Fugu Ultra model matching frontier performance without export control risks.
Sakana Fugu
Sakana Fugu dynamically orchestrates a diverse pool of top models to tackle complex, multi-step tasks via a single API, leveraging their ICLR 2026 papers on learned orchestration to achieve frontier-level performance without single-vendor dependency.
@DeRonin_: HOLY SH*T, got released Fable-class model in public from Japan by coding and research benchmarks it's literally equival…
Sakana AI released Fugu Ultra, a multi-agent orchestration system accessible via a single model API, achieving performance competitive with Fable and Mythos models.
@rohanpaul_ai: Sakana Fugu Ultra just beat the other models on visual polish in a live trading-desk coding test, got close to GLM 5.2,…
Sakana's Fugu Ultra model orchestration system outperformed other models in a live coding test for a trading desk UI, though at 17x higher cost, demonstrating its strength in visual polish and multi-agent coordination.