Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas
Summary
This paper presents a two-level autoresearch framework where an outer-loop AI agent autonomously optimizes inner-loop LLM policy-synthesis pipelines for multi-agent sequential social dilemmas, achieving superior performance and discovering objective-specific mechanisms like fairness under a maximin welfare objective.
View Cached Full Text
Cached at: 05/29/26, 07:00 AM
Paper page - Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas
Source: https://huggingface.co/papers/2605.30003
Abstract
Two-level autoresearch framework enables AI agents to autonomously optimize LLM policy-synthesis pipelines for multi-agent social dilemmas, demonstrating superior performance and objective-specific mechanism discovery.
We study two-levelautoresearchfor cooperation: anouter-loop AI agentautonomously redesigns theinner-loop pipelineof anLLM policy-synthesissystem formulti-agent Sequential Social Dilemmas(SSDs). Aresearcher agentR (run as a coding agent) reads the inner-loop source code, edits system prompts, feedback functions, helper libraries, and iteration logic, runs evaluations, and decides what to keep, following theautoresearchparadigm. Across two games (Cleanup and Gathering), twopolicy-synthesizerLLMs, and twowelfare objectives(utilitarian efficiencyandRawlsian maximin), the researcher reliably exceeds hand-designed baselines, sharply tightens run-to-run variance, and outperforms prompt-only optimization. The discovered pipelines are objective-dependent: only under maximin does the researcher inject an explicit fairness mechanism into synthesizer pipelines, a class of mechanism that is absent from its own objective-agnostic system prompt and from every efficiency-optimized pipeline. This supports aninformation-designreading in which the researcher chooses what to reveal to theboundedly rational synthesizeras a function of the welfare objective. Code at https://github.com/vicgalle/autoresearch-social-dilemmas.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.30003
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.30003 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.30003 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.30003 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
ALSO: Adversarial Online Strategy Optimization for Social Agents
ALSO introduces a framework for online strategy optimization in multi-agent social simulation, formulating multi-turn interaction as an adversarial bandit problem and using a neural surrogate for reward prediction. Experiments on the Sotopia benchmark show it outperforms static baselines and existing optimization methods.
Learning to cooperate, compete, and communicate
OpenAI presents research on multi-agent reinforcement learning environments where agents learn to cooperate, compete, and communicate. The paper introduces MADDPG (Multi-Agent DDPG), a centralized critic approach that enables agents to learn collaborative strategies and communication protocols more effectively than traditional decentralized methods.
Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs
This paper introduces a critique-and-routing controller for multi-agent LLM systems that formulates coordination as a sequential decision problem. It uses policy gradients to optimize the controller for iterative refinement, outperforming baselines while reducing reliance on top-tier models.
Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization
This paper proposes an exploration-aware reinforcement learning framework that enables LLM agents to adaptively explore only when uncertainty is high, improving performance on text-based and GUI-based benchmarks.
@lftherios: 1/ Autoresearch from @karpathy has been one of the most interesting agentic patterns to emerge this year. The challenge…
Andrej Karpathy's autoresearch pattern highlights how current AI agents run experiments in isolation, wasting compute by duplicating work and rediscovering dead ends.