Tag
This paper introduces the Regulatory Context Protocol (RCP), an agent-to-agent communication standard designed to streamline regulatory review processes, using advanced nuclear reactor licensing as a case study. It claims to cut costs by 50–77% and timelines by 65% compared to traditional methods, with potential broad applicability across sectors like pharmaceuticals and aviation.
This paper introduces MAC-Bench, a dynamic adversarial benchmark for evaluating procedural compliance in multi-agent systems. It proposes the SERV pipeline to generate contamination-free scenarios and new metrics like Compliance-Weighted Success Rate (CSR) and Machiavellian Gap (MG).
A technical blog post describing a hackathon project where five different small AI models run a simulated economy, revealing that emergent market behavior differs when using heterogeneous agents compared to a single model, and that the price is a residue of agent decisions rather than a controllable dial.
This paper introduces Queen-Bee, a governed multi-agent architecture for enterprise MCP orchestration that separates planning and execution via a BeeSpec intermediate representation, achieving high task success rates with zero governance failures in prototype evaluations.
This paper defines cultural diversity as a new evaluation dimension for multi-agent systems, measuring pairwise differences in responses to the World Values Survey. Experiments show current models lack the value diversity of human societies and that mixing backbones can improve both alignment and diversity, but interaction reduces diversity.
DMAIC-IAD is a multi-agent LLM system inspired by the DMAIC quality-management framework for industrial anomaly detection, using a 'Plan First, Judge Later' approach that formulates strategies via standardized operating procedures and ranks them with an execution-free judge model, achieving 37.76% improvement over agentic baselines across four data modalities.
This paper argues that consensus-seeking in multi-agent LLM systems is insufficient for value-laden tasks, proposing a knowledge-representation layer that classifies agent reasoning-trace disagreements into four symbolic states to enable strategic routing in systems like content moderation.
The builders of a multi-agent system added a dead man's switch that alerts a human when all four outbound communication channels are blocked simultaneously, preventing silent failures. The fix includes a dedup guard to avoid repeated alerts.
StepFinder is a lightweight framework that uses LLMs only in the feature construction phase to encode execution logs into temporal semantic sequences, then applies parameter-efficient temporal and attention modules for failure attribution in multi-agent systems. It reduces inference time by 79% compared to the fastest LLM-based method on the Who&When benchmark.
This paper introduces PACT, a method for structuring agent-to-agent communication in multi-agent LLM systems that uses compact action-state records to reduce token consumption while maintaining or improving task performance, with demonstrated gains on SWE-agent and OpenHands.
A practitioner observes that limiting AI agents to plan only one step ahead instead of multiple steps significantly improves reliability in real-world automation workflows involving CRM and lead qualification, as long-range plans become brittle when external state changes.
This paper proposes directly mapping mature architectural patterns from distributed systems (such as publish-subscribe and message queues) to multi-agent systems to lower the development barrier. It was validated in a course: even students with no distributed systems experience could get started with gRPC and RabbitMQ, achieving an average score above 80%.
The author shares pitfalls from building a shared decision log for AI agent teams, including race conditions exposed by faster models, unreliable contradiction detection with cosine similarity, and challenges in testing multi-agent promises.
HypoAgent is an agentic framework for interactive abductive hypothesis generation over knowledge graphs, integrating three agents to handle evolving user intents and fine-grained diagnosis, achieving state-of-the-art performance.
This paper investigates whether team-based interaction improves LLM performance in the quiz game 'What? Where? When?' (ChGK). Using six recent open LLMs on a 2025 dataset of 572 questions, they show that team strategies (voting, silent captain, talkative captain) outperform single models by up to 20 percentage points, with the best team achieving 44.23% accuracy, approaching human performance.
The author shares experiences moving AI agent systems from sandbox to production, highlighting how human roles become ambiguous and teams disengage when agents execute tasks, leading to operational failures.
Discusses challenges with coding agents in complex long-horizon tasks, highlighting bizarre user experience issues and inefficient agent interactions, and advocates for more control over the agent harness.
A roundup of three notable AI papers: SkillOpt treats skill documents as trainable parameters to optimize frozen agents; a new method compiles agentic workflows into model weights for 100x cost reduction; and AutoScientists introduces a decentralized agent team for long-running science without a central planner.
An open-source interactive playbook for building an Agentic DevOps pipeline, covering observability, test-driven prompt evaluations, guardrails, and cost control for multi-agent systems.
A discussion on patterns for handling cascading failures in multi-agent AI systems, comparing supervisor-worker and peer-to-peer topologies.