Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas

Hugging Face Daily Papers Papers

Summary

This paper presents a two-level autoresearch framework where an outer-loop AI agent autonomously optimizes inner-loop LLM policy-synthesis pipelines for multi-agent sequential social dilemmas, achieving superior performance and discovering objective-specific mechanisms like fairness under a maximin welfare objective.

We study two-level autoresearch for cooperation: an outer-loop AI agent autonomously redesigns the inner-loop pipeline of an LLM policy-synthesis system for multi-agent Sequential Social Dilemmas (SSDs). A researcher agent R (run as a coding agent) reads the inner-loop source code, edits system prompts, feedback functions, helper libraries, and iteration logic, runs evaluations, and decides what to keep, following the autoresearch paradigm. Across two games (Cleanup and Gathering), two policy-synthesizer LLMs, and two welfare objectives (utilitarian efficiency and Rawlsian maximin), the researcher reliably exceeds hand-designed baselines, sharply tightens run-to-run variance, and outperforms prompt-only optimization. The discovered pipelines are objective-dependent: only under maximin does the researcher inject an explicit fairness mechanism into synthesizer pipelines, a class of mechanism that is absent from its own objective-agnostic system prompt and from every efficiency-optimized pipeline. This supports an information-design reading in which the researcher chooses what to reveal to the boundedly rational synthesizer as a function of the welfare objective. Code at https://github.com/vicgalle/autoresearch-social-dilemmas.
Original Article
View Cached Full Text

Cached at: 05/29/26, 07:00 AM

Paper page - Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas

Source: https://huggingface.co/papers/2605.30003

Abstract

Two-level autoresearch framework enables AI agents to autonomously optimize LLM policy-synthesis pipelines for multi-agent social dilemmas, demonstrating superior performance and objective-specific mechanism discovery.

We study two-levelautoresearchfor cooperation: anouter-loop AI agentautonomously redesigns theinner-loop pipelineof anLLM policy-synthesissystem formulti-agent Sequential Social Dilemmas(SSDs). Aresearcher agentR (run as a coding agent) reads the inner-loop source code, edits system prompts, feedback functions, helper libraries, and iteration logic, runs evaluations, and decides what to keep, following theautoresearchparadigm. Across two games (Cleanup and Gathering), twopolicy-synthesizerLLMs, and twowelfare objectives(utilitarian efficiencyandRawlsian maximin), the researcher reliably exceeds hand-designed baselines, sharply tightens run-to-run variance, and outperforms prompt-only optimization. The discovered pipelines are objective-dependent: only under maximin does the researcher inject an explicit fairness mechanism into synthesizer pipelines, a class of mechanism that is absent from its own objective-agnostic system prompt and from every efficiency-optimized pipeline. This supports aninformation-designreading in which the researcher chooses what to reveal to theboundedly rational synthesizeras a function of the welfare objective. Code at https://github.com/vicgalle/autoresearch-social-dilemmas.

View arXiv pageView PDFAdd to collection

Get this paper in your agent:

hf papers read 2605\.30003

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.30003 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.30003 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.30003 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

ALSO: Adversarial Online Strategy Optimization for Social Agents

arXiv cs.AI

ALSO introduces a framework for online strategy optimization in multi-agent social simulation, formulating multi-turn interaction as an adversarial bandit problem and using a neural surrogate for reward prediction. Experiments on the Sotopia benchmark show it outperforms static baselines and existing optimization methods.

Learning to cooperate, compete, and communicate

OpenAI Blog

OpenAI presents research on multi-agent reinforcement learning environments where agents learn to cooperate, compete, and communicate. The paper introduces MADDPG (Multi-Agent DDPG), a centralized critic approach that enables agents to learn collaborative strategies and communication protocols more effectively than traditional decentralized methods.