Sakana Fugu (3 minute read)

TLDR AI 06/22/26, 12:00 AM Papers

inference-time-scaling mcts collective-intelligence multi-model frontier-models arc-agi open-source

Summary

Sakana AI introduces AB-MCTS, an inference-time scaling algorithm that enables multiple frontier AI models (Gemini 2.5 Pro, o4-mini, DeepSeek-R1-0528) to cooperate, significantly outperforming individual models on the ARC-AGI-2 benchmark.

Sakana Fugu is a multi-agent system that behaves like a single model. Fugu can decide whether to handle requests directly or coordinate a team of expert models. It manages model selection, delegation, verification, and synthesis. Users simply call one model, and a coordinated system of experts does the work. Sakana Fugu and Fugu Ultra are available today through a single OpenAI-compatible API.

Original Article

View Cached Full Text

Cached at: 06/22/26, 01:31 PM

# Thread by @SakanaAILabs on Thread Reader App Source: [https://threadreaderapp.com/thread/2068862070062485867.html](https://threadreaderapp.com/thread/2068862070062485867.html) We’re excited to introduce AB\-MCTS\! Our new inference\-time scaling algorithm enables collective intelligence for AI by allowing multiple frontier models \(like Gemini 2\.5 Pro, o4\-mini, DeepSeek\-R1\-0528\) to cooperate\. Blog:[sakana\.ai/ab\-mcts](https://sakana.ai/ab-mcts) Paper:[arxiv\.org/abs/2503\.04412](https://arxiv.org/abs/2503.04412) Inspired by the power of human collective intelligence, where the greatest achievements arise from the collaboration of diverse minds, we believe the same principle applies to AI\. Individual frontier models like ChatGPT, Gemini, and DeepSeek are remarkably advanced, each possessing unique strengths and biases stemming from their training, which we view as valuable resources for collective problem\-solving\. AB\-MCTS \(Adaptive Branching Monte Carlo Tree Search\) harnesses these individualities, allowing multiple models to cooperate and engage in effective trial\-and\-error, solving challenging problems for any single AI\. Our initial results on the ARC\-AGI\-2 benchmark are promising, with AB\-MCTS combining o4\-mini \+ Gemini\-2\.5\-Pro \+ R1\-0528, current frontier AI models, significantly outperforming individual models by a substantial margin\. This research builds on our 2024 work on evolutionary model merging, shifting focus from “mixing to create” to “mixing to use” existing, powerful AIs\. At Sakana AI, we remain committed to pioneering novel AI systems by applying nature\-inspired principles such as evolution and collective intelligence\. We believe this work represents a step toward a future where AI systems collaboratively tackle complex challenges, much like a team of human experts, unlocking new problem\-solving capabilities and moving beyond single\-model limitations\. Algorithm \(TreeQuest\):[github\.com/SakanaAI/treeq…](https://github.com/SakanaAI/treequest) ARC\-AGI Experiments:[github\.com/SakanaAI/ab\-mc…](https://github.com/SakanaAI/ab-mcts-arc2)[![Image](https://threadreaderapp.com/images/1px.png)](https://pbs.twimg.com/media/Guu-b3DXkAE-ROp.jpg) The AB\-MCTS combination of o4\-mini \+ Gemini\-2\.5\-Pro \+ R1\-0528, current frontier AI models, achieves strong performance on the ARC\-AGI\-2 benchmark, outperforming individual models by a large margin\. We open\-sourced our implementation of AB\-MCTS: [github\.com/SakanaAI/treeq…](https://github.com/SakanaAI/treequest)[![Results of AB-MCTS and Multi-LLM AB-MCTS on ARC-AGI-2, showing Pass@k as a function of the number of LLM calls.](https://threadreaderapp.com/images/1px.png)](https://pbs.twimg.com/media/Guv26nqWcAEm7dh.jpg) Many ARC\-AGI\-2 examples that were unsolvable by any single LLM were solved by combining multiple LLMs\. In some cases, an initially incorrect attempt by o4\-mini is used by R1\-0528 and Gemini\-2\.5\-Pro as a hint to get to the correct solution\. ARC\-AGI\-2 code: [github\.com/SakanaAI/ab\-mc…](https://github.com/SakanaAI/ab-mcts-arc2)[![An example problem from ARC-AGI-2. The task is to infer the common transformation rule from the three demonstration cases on the left and apply it to the test case on the right. This is one of the problems that became solvable using Multi-LLM AB-MCTS.](https://threadreaderapp.com/images/1px.png)](https://pbs.twimg.com/media/GuxWQUnXgAAE5Ho.jpg)

Sakana Fugu (3 minute read)

Similar Articles

Sakana Fugu

Sakana Fugu

@rohanpaul_ai: Sakana Fugu Ultra just beat the other models on visual polish in a live trading-desk coding test, got close to GLM 5.2,…

@DeRonin_: HOLY SH*T, got released Fable-class model in public from Japan by coding and research benchmarks it's literally equival…

@amitiitbhu: https://x.com/amitiitbhu/status/2069023290182758497

Submit Feedback

Similar Articles

@rohanpaul_ai: Sakana Fugu Ultra just beat the other models on visual polish in a live trading-desk coding test, got close to GLM 5.2,…

@DeRonin_: HOLY SH*T, got released Fable-class model in public from Japan by coding and research benchmarks it's literally equival…

@amitiitbhu: https://x.com/amitiitbhu/status/2069023290182758497