ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

Papers with Code Trending 05/04/26, 12:00 AM Papers

Summary

ARIS is an open-source research harness that uses cross-model adversarial collaboration to ensure reliable long-term research outcomes through coordinated execution, orchestration, and assurance layers.

This report describes ARIS (Auto-Research-in-sleep), an open-source research harness for autonomous research, including its architecture, assurance mechanisms, and early deployment experience. The performance of agent systems built on LLMs depends on both the model weights and the harness around them, which governs what information to store, retrieve, and present to the model. For long-horizon research workflows, the central failure mode is not a visible breakdown but a plausible unsupported success: a long-running agent can produce claims whose evidential support is incomplete, misreported, or silently inherited from the executor's framing. Therefore, we present ARIS as a research harness that coordinates machine-learning research workflows through cross-model adversarial collaboration as a default configuration: an executor model drives forward progress while a reviewer from a different model family is recommended to critique intermediate artifacts and request revisions. ARIS has three architectural layers. The execution layer provides more than 65 reusable Markdown-defined skills, model integrations via MCP, a persistent research wiki for iterative reuse of prior findings, and deterministic figure generation. The orchestration layer coordinates five end-to-end workflows with adjustable effort settings and configurable routing to reviewer models. The assurance layer includes a three-stage process for checking whether experimental claims are supported by evidence: integrity verification, result-to-claim mapping, and claim auditing that cross-checks manuscript statements against the claim ledger and raw evidence, as well as a five-pass scientific-editing pipeline, mathematical-proof checks, and visual inspection of the rendered PDF. A prototype self-improvement loop records research traces and proposes harness improvements that are adopted only after reviewer approval.

Original Article

View Cached Full Text

Cached at: 05/08/26, 08:36 AM

Paper page - ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

Source: https://huggingface.co/papers/2605.03042

Abstract

This report describes ARIS (Auto-Research-in-sleep), an open-sourceresearch harnessfor autonomous research, including its architecture, assurance mechanisms, and early deployment experience. The performance ofagent systemsbuilt onLLMsdepends on both themodel weightsand the harness around them, which governs what information to store, retrieve, and present to the model. For long-horizon research workflows, the central failure mode is not a visible breakdown but a plausible unsupported success: a long-running agent can produce claims whose evidential support is incomplete, misreported, or silently inherited from the executor’s framing. Therefore, we present ARIS as aresearch harnessthat coordinates machine-learning research workflows throughcross-model adversarial collaborationas a default configuration: anexecutor modeldrives forward progress while a reviewer from a different model family is recommended to critique intermediate artifacts and request revisions. ARIS has three architectural layers. The execution layer provides more than 65 reusableMarkdown-defined skills, model integrations viaMCP, apersistent research wikifor iterative reuse of prior findings, anddeterministic figure generation. The orchestration layer coordinates fiveend-to-end workflowswithadjustable effort settingsandconfigurable routingtoreviewer models. The assurance layer includes a three-stage process for checking whether experimental claims are supported by evidence:integrity verification,result-to-claim mapping, andclaim auditingthat cross-checks manuscript statements against the claim ledger and raw evidence, as well as a five-passscientific-editing pipeline,mathematical-proof checks, andvisual inspectionof the rendered PDF. A prototype self-improvement loop records research traces and proposes harness improvements that are adopted only after reviewer approval.

View arXiv page View PDF Project page GitHub8.39k Add to collection

Get this paper in your agent:

hf papers read 2605\.03042

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.03042 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.03042 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.03042 in a Space README.md to link it from this page.

ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

Paper page - ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper4

Similar Articles

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

Solving an ARD problem in AI: Agentic Resource Discovery (2 minute read)

Agentic Resource Discovery Specification

Agentic Resource Discovery: Let agents search

The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment

Submit Feedback

Similar Articles

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

Solving an ARD problem in AI: Agentic Resource Discovery (2 minute read)

Agentic Resource Discovery Specification

Agentic Resource Discovery: Let agents search

The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment