Wisdom Of The (AI) Crowd: Investigating Artificial Swarm Intelligence In Large Language Models

arXiv cs.AI Papers

Summary

This paper investigates whether LLMs can approximate swarm intelligence effects through intra- and inter-model aggregation, finding significant error reductions up to 37 percentage points in MAPE across eight estimation tasks.

arXiv:2606.31404v1 Announce Type: new Abstract: Human swarm intelligence demonstrates remarkable collective accuracy but faces scalability constraints in cost, coordination, and time. We investigate whether large language models (LLMs) can approximate swarm intelligence effects through artificial swarms, addressing a critical gap in understanding AI-based aggregation mechanisms. We conducted a controlled experiment with 960 manually executed prompts across three proprietary models (GPT-5, Gemini 2.5 Pro, Claude Sonnet 4.5), testing intra-model sampling and inter-model aggregation on eight estimation tasks. Results reveal consistent error reduction through intra- and inter-model aggregation, with significant error reductions up to 37 percentage points in MAPE across different aggregation strategies. We observed small to large effect sizes for positive correlations (Spearman's $\rho=0.242-0.568$, all $p<0.001$) between relative confidence interval widths and relative estimation errors, suggesting LLMs possess metacognitive awareness when assessing uncertainty. We discuss implications for research and practice, providing actionable insights for deploying LLM swarms in organizational decision-making.
Original Article
View Cached Full Text

Cached at: 07/01/26, 05:38 AM

# Wisdom Of The (AI) Crowd: Investigating Artificial Swarm Intelligence In Large Language Models
Source: [https://arxiv.org/abs/2606.31404](https://arxiv.org/abs/2606.31404)
[View PDF](https://arxiv.org/pdf/2606.31404)

> Abstract:Human swarm intelligence demonstrates remarkable collective accuracy but faces scalability constraints in cost, coordination, and time\. We investigate whether large language models \(LLMs\) can approximate swarm intelligence effects through artificial swarms, addressing a critical gap in understanding AI\-based aggregation mechanisms\. We conducted a controlled experiment with 960 manually executed prompts across three proprietary models \(GPT\-5, Gemini 2\.5 Pro, Claude Sonnet 4\.5\), testing intra\-model sampling and inter\-model aggregation on eight estimation tasks\. Results reveal consistent error reduction through intra\- and inter\-model aggregation, with significant error reductions up to 37 percentage points in MAPE across different aggregation strategies\. We observed small to large effect sizes for positive correlations \(Spearman's $\\rho=0\.242\-0\.568$, all $p<0\.001$\) between relative confidence interval widths and relative estimation errors, suggesting LLMs possess metacognitive awareness when assessing uncertainty\. We discuss implications for research and practice, providing actionable insights for deploying LLM swarms in organizational decision\-making\.

## Submission history

From: Justin Brenne \[[view email](https://arxiv.org/show-email/9f59cc5b/2606.31404)\] **\[v1\]**Tue, 30 Jun 2026 09:30:53 UTC \(549 KB\)

Similar Articles

Large Language Models over Networks: Collaborative Intelligence under Resource Constraints

Hugging Face Daily Papers

This paper explores collaborative intelligence paradigms where distributed Large Language Models work together across devices and clouds to handle resource constraints. It covers vertical device-cloud collaboration, horizontal multi-agent collaboration, routing policies, and open research challenges in scalable and trustworthy cooperative AI.

A better method for identifying overconfident large language models

MIT News — Artificial Intelligence

MIT researchers developed a new method for identifying overconfident LLMs by measuring cross-model disagreement across similar models, rather than relying solely on self-consistency metrics. This approach better captures epistemic uncertainty and more accurately identifies unreliable predictions in high-stakes applications.