Wisdom Of The (AI) Crowd: Investigating Artificial Swarm Intelligence In Large Language Models
Summary
This paper investigates whether LLMs can approximate swarm intelligence effects through intra- and inter-model aggregation, finding significant error reductions up to 37 percentage points in MAPE across eight estimation tasks.
View Cached Full Text
Cached at: 07/01/26, 05:38 AM
# Wisdom Of The (AI) Crowd: Investigating Artificial Swarm Intelligence In Large Language Models Source: [https://arxiv.org/abs/2606.31404](https://arxiv.org/abs/2606.31404) [View PDF](https://arxiv.org/pdf/2606.31404) > Abstract:Human swarm intelligence demonstrates remarkable collective accuracy but faces scalability constraints in cost, coordination, and time\. We investigate whether large language models \(LLMs\) can approximate swarm intelligence effects through artificial swarms, addressing a critical gap in understanding AI\-based aggregation mechanisms\. We conducted a controlled experiment with 960 manually executed prompts across three proprietary models \(GPT\-5, Gemini 2\.5 Pro, Claude Sonnet 4\.5\), testing intra\-model sampling and inter\-model aggregation on eight estimation tasks\. Results reveal consistent error reduction through intra\- and inter\-model aggregation, with significant error reductions up to 37 percentage points in MAPE across different aggregation strategies\. We observed small to large effect sizes for positive correlations \(Spearman's $\\rho=0\.242\-0\.568$, all $p<0\.001$\) between relative confidence interval widths and relative estimation errors, suggesting LLMs possess metacognitive awareness when assessing uncertainty\. We discuss implications for research and practice, providing actionable insights for deploying LLM swarms in organizational decision\-making\. ## Submission history From: Justin Brenne \[[view email](https://arxiv.org/show-email/9f59cc5b/2606.31404)\] **\[v1\]**Tue, 30 Jun 2026 09:30:53 UTC \(549 KB\)
Similar Articles
SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research
This paper introduces SearchSwarm, a model trained on synthesized delegation intelligence to improve long-horizon deep research tasks via task decomposition and subagent coordination, achieving state-of-the-art results on BrowseComp benchmarks.
Can AI Guess What You Know? Performance Comparison of Large Language Models for Human Domain Knowledge Estimation From Communication Logs
This paper investigates whether LLMs can infer individual domain knowledge from long-term Slack logs, comparing seven models and finding Gemini 2.5 Flash achieves the lowest error, highlighting feasibility and limits of automated expertise mapping.
Large Language Models over Networks: Collaborative Intelligence under Resource Constraints
This paper explores collaborative intelligence paradigms where distributed Large Language Models work together across devices and clouds to handle resource constraints. It covers vertical device-cloud collaboration, horizontal multi-agent collaboration, routing policies, and open research challenges in scalable and trustworthy cooperative AI.
SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models
SMAC-Talk is a new benchmark that extends the StarCraft Multi-Agent Challenge to evaluate LLM-based agents in cooperative multi-agent environments with natural language communication. It includes scenarios with deceptive communicators and benchmarks agents using models from the Qwen3.5 family to study how reasoning, memory, and scale affect coordination.
A better method for identifying overconfident large language models
MIT researchers developed a new method for identifying overconfident LLMs by measuring cross-model disagreement across similar models, rather than relying solely on self-consistency metrics. This approach better captures epistemic uncertainty and more accurately identifies unreliable predictions in high-stakes applications.