Learning policy representations in multiagent systems
Summary
OpenAI researchers propose a general framework for learning representations of agent policies in multiagent systems using minimal interaction data, casting the problem as representation learning with applications to competitive control and cooperative communication environments.
View Cached Full Text
Cached at: 04/20/26, 02:46 PM
Similar Articles
Learning to cooperate, compete, and communicate
OpenAI presents research on multi-agent reinforcement learning environments where agents learn to cooperate, compete, and communicate. The paper introduces MADDPG (Multi-Agent DDPG), a centralized critic approach that enables agents to learn collaborative strategies and communication protocols more effectively than traditional decentralized methods.
Learning to communicate
OpenAI researchers demonstrate that cooperative AI agents can develop their own grounded and compositional language through reinforcement learning in simple worlds. The agents learn to communicate by being rewarded for achieving goals that require coordination, creating shared symbolic languages to coordinate behavior.
Learning Agentic Policy from Action Guidance
The paper proposes ActGuide-RL, a method for training agentic policies in LLMs by using human action data as guidance to overcome exploration barriers in reinforcement learning without extensive supervised fine-tuning.
When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs
This paper studies when end-to-end reinforcement learning training improves multi-agent LLM workflows, comparing shared-policy and isolated-policy training across different workflows, tasks, and model scales, revealing conditional tradeoffs.
NeuroMAS: Multi-Agent Systems as Neural Networks with Joint Reinforcement Learning
NeuroMAS treats multi-agent language systems as trainable neural-network-like architectures with LLM agents as nodes, using reinforcement learning to learn communication and specialization. It shows improved performance and that progressive growth from smaller systems works better than training large systems from scratch.