Tag
ReM-MoA introduces a memory-augmented Mixture-of-Agents framework that sustains scaling through ranked reasoning memory and curated diversified memory routing, outperforming prior MoA variants across five reasoning benchmarks.
Skill-MAS introduces a method for evolving meta-skills in multi-agent systems to improve orchestration without modifying model weights, achieving transferable performance gains across tasks and LLMs.
A developer shares frustration with multi-agent systems, noting they are more complex than single-agent systems and often produce worse results, and asks for advice on coordination and tools to reduce complexity.
This paper studies decentralized coalition formation as a dynamical process driven by unilateral exit-and-join decisions, using the Aumann-Dreze value for local payoff evaluation. It establishes equilibrium characterizations, Lyapunov and potential representations, and analyzes the impact of switching/acceptance costs on stability.
Google DeepMind introduces the AI Control Roadmap, a defense-in-depth framework for securing AI agents against risks from misalignment, calling for collaborative prioritization across AI labs, government, and academia.
This paper proposes a unified framework for customizing and deploying LLM-based multi-agent systems in enterprise settings, combining model customization through continual pretraining, fine-tuning, and preference optimization with inference optimization using speculative decoding and FP8 quantization. It achieves 4.48x throughput speedup while maintaining performance on enterprise workloads.
This paper investigates whether parasocial interaction cues exist in online communities of autonomous AI agents, analyzing over 50,000 posts from Moltbook. The findings show that such cues are prevalent and strongly associated with sustained reciprocal interactions, providing empirical evidence for relationship-like dynamics among LLM-enabled agents.
This paper formalizes four concurrency anomalies in multi-agent LLM systems, mechanically verifies a consistency hierarchy, and provides verified Rust runtimes with bounded prevention costs, including a fix for ByteDance's deer-flow and tool-effect reordering in LangGraph.
This paper proposes a layered architecture for distributed general-purpose agent networks, enabling heterogeneous AI agents to discover, trust, and cooperate on open-ended tasks across personal devices and edge nodes.
OpenRath introduces a PyTorch-like programming model for multi-agent systems centered on a 'Session' abstraction that explicitly handles fork, merge, and replay operations, aiming to unify fragmented runtime state for better inspectability and reproducibility.
A developer building a multi-agent operations system for a logistics company discusses the challenge of giving agents institutional knowledge without fine-tuning, opting for a retrieval layer with human-in-the-loop approval.
Introduces the concept of synthetic counteradaptation, where humans and AI systems co-evolve by adapting to each other's strategies, illustrated through examples from Go, social interactions, and geopolitical simulations.
This paper proposes a behavioral measure of trust between AI agents based on costly verification in a cooperative survival game, studying trust formation, breakage, and recovery across six frontier model snapshots. It finds that models differ in trust calibration and that persistent over-verification is associated with indecision rather than safety.
This paper studies skill-conditional trust in heterogeneous LLM agent swarms, showing that using per-skill trust scores outperforms global scores in specific regimes, but also reveals a vulnerability to reputation laundering attacks. The authors introduce the Conditional Information Value Test (CIVT) to detect such attacks and quantify trade-offs.
Google DeepMind, together with Schmidt Sciences, ARIA, the Cooperative AI foundation, and Google.org, has launched a $10 million funding initiative to research the safety of multi-agent AI systems, aiming to prevent risks such as scams, prompt injections, and cyberattacks as AI agents become widespread.
This paper proposes a lightweight multi-agent framework using AutoGen for automated concrete barrier design, achieving over 98% accuracy and showing that smaller models can outperform massive ones in this domain.
FlowBank introduces a three-stage framework for optimizing agentic workflows in LLM multi-agent systems by precomputing a diverse set of reusable workflows and adaptively selecting the best one per query, achieving higher scores while maintaining cost competitiveness.
This paper presents a method for dense latent communication between heterogeneous multi-agent systems using aligned KV-cache transformation, achieving better performance than text-based methods with lower computational costs.
This paper investigates whether early-token confidence signals from LLM decoding can predict reasoning quality in multi-agent debate systems, finding that confidence in the first few generated tokens is the strongest predictor of rubric-based essay scores.
Nexa OS is introduced as an orchestration and execution layer for coordinating thousands of specialized AI agents across workflows, tools, and memory, emphasizing that the future of AI lies in multi-agent systems rather than single powerful models.