more models more better. one expensive model is losing to three cheap ones, and there's a paper on it
Summary
A mixture-of-agents paper (arxiv 2406.04692) shows that a committee of cheap open models can outperform GPT-4o on AlpacaEval 2.0 by leveraging decorrelated errors, and the author shares similar real-world findings where multiple cheap models catch more bugs than a single expensive model.
Similar Articles
AI agents feel much more reliable once multiple models are involved
An exploration of how using multiple AI models for agent workflows reveals hidden uncertainties and reasoning gaps, suggesting that future systems may rely on cross-model consensus rather than single-model chains.
@dair_ai: NEW paper worth reading. GPT-5.4 nano plus a critic-comparator orchestration loop hits 76.4% on SWE-bench Verified, mat…
A new paper shows that using a weak model with k=8 proposals and a critic-comparator selection loop can match frontier model performance on SWE-bench Verified, reaching 76.4% accuracy. The key insight is that correct patches are often already present in a weak model's top-k candidates, and the challenge is effective selection using execution verification.
@ChrisGPotts: We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Hua…
This paper investigates why larger models outperform smaller ones, attributing it to data-induced competition for neural resources through formal analysis and experiments.
A 4b model is now beating 30b ones at web research and the reason is not size
A 4 billion parameter open model from the Apodex family outperforms 30 billion parameter models on web research benchmarks, attributed to careful training data and self-verification techniques rather than raw scale, suggesting a more democratic trajectory for AI capability.
Five labs, five minds: building a multi-model finance drama on small models (6 minute read)
A field report on building a multi-model finance drama game where each agent runs on a different lab's small model, demonstrating the engineering challenges and benefits of model heterogeneity.