I ran 13 controlled experiments on my own multi-agent coding setup. Personas did nothing; one coordination trick did almost everything.

Reddit r/AI_Agents 05/29/26, 12:04 PM Papers

multi-agent coding coordination experiments personas typescript agent-runs

Summary

An AI researcher ran 13 controlled experiments on a multi-agent coding system, finding that dependency-ordered coordination significantly improved success rates while persona backstories had no measurable benefit.

Most multi-agent repos are a cast of characters with no falsifiable claim. I wanted numbers, so I tested my own system with real oracles (a TypeScript compiler and pre-registered answer keys) across \~540 scored agent runs. What held up: * **Dependency-ordered coordination (a "Change Dependency Graph").** Finalize the upstream change, give the downstream agent the *real* names instead of letting it guess. Across 4 contract-change types: naive parallel 3/12, CDG-ordered 12/12 (compiler-scored). * The sharp bit: naive parallel passed **6/6 on Opus** but **0/6 on Sonnet**, same task. A stronger model just guesses the same names and hides the bug. Coordination buys invariance. * It generalized beyond code (writing/advisory/game-design): 9/9 vs 3/9. What didn't hold up (the fun part): * **Persona backstories:** placebo-controlled across 5 roles, zero measurable benefit. An off-topic backstory did just as well. The lever was the *checklist*, not the identity. * **The deterministic test gate has a coverage ceiling.** A logic bug in an untested path passes clean, even with a confident "all tests pass" from the agent. * **3 advisors caught all 15 planted issues.** Advisors 4 through 10 added nothing unique. I'm publishing the results that undercut my own design on purpose, including the two times my experiment setup broke and accidentally re-confirmed a finding. Happy to answer methodology questions or take shots at the design in the comments.

Original Article

Similar Articles

been experimenting with custom agents, and the interesting part isn't task completion — it's what changes when they have memory

Reddit r/ArtificialInteligence

The author reflects on experimenting with custom AI agents, noting that long-term memory and continuity transform them from simple task runners into persistent collaborators with 'stable dispositions'. This raises questions about the value of agent 'personality' versus the need for control, reliability, and auditability in workflows.

I run a company with 89 AI agents across 22 departments. Here is what I have learned about multi-agent coordination.

Reddit r/AI_Agents

A CEO shares practical lessons from running a company with 89 AI agents across 22 departments, highlighting delegation as the bottleneck, the value of agent memory, the need for department structure, and the continued importance of human leadership.

I stopped trying to build one super-agent and split it into 4 narrow agents. Reliability went way up.

Reddit r/AI_Agents

The author describes improving AI agent reliability by replacing a single general-purpose agent with a four-agent workflow specializing in intake, research, action, and review. This shift prioritized system predictability and easier debugging over raw autonomy.

Beyond Autonomy: The Power of an Agent That Knows Its Limits

Reddit r/AI_Agents

The COWCORPUS project, a study of 4,200 human-AI interactions, found that agents predicting their own failures and intervention moments are more useful than those simply trying to avoid errors. Researchers identified four stable trust patterns in human-AI collaboration and developed the Perfect Timing Score (PTS) to measure intervention prediction accuracy.

I gave an AI coding agent a structured execution framework and let it iterate for dozens of rounds. The long-task stability difference became hard to ignore.

Reddit r/AI_Agents

The author tested an AI coding agent with a structured execution framework and found it dramatically improved long-task stability, enabling the agent to build a complete browser tactical FPS game over dozens of iterations without architectural drift.

Similar Articles

been experimenting with custom agents, and the interesting part isn't task completion — it's what changes when they have memory

I run a company with 89 AI agents across 22 departments. Here is what I have learned about multi-agent coordination.

I stopped trying to build one super-agent and split it into 4 narrow agents. Reliability went way up.

Beyond Autonomy: The Power of an Agent That Knows Its Limits

I gave an AI coding agent a structured execution framework and let it iterate for dozens of rounds. The long-task stability difference became hard to ignore.

Submit Feedback