Learning to cooperate, compete, and communicate
Summary
OpenAI presents research on multi-agent reinforcement learning environments where agents learn to cooperate, compete, and communicate. The paper introduces MADDPG (Multi-Agent DDPG), a centralized critic approach that enables agents to learn collaborative strategies and communication protocols more effectively than traditional decentralized methods.
View Cached Full Text
Cached at: 04/20/26, 02:45 PM
Similar Articles
Learning to communicate
OpenAI researchers demonstrate that cooperative AI agents can develop their own grounded and compositional language through reinforcement learning in simple worlds. The agents learn to communicate by being rewarded for achieving goals that require coordination, creating shared symbolic languages to coordinate behavior.
Learning policy representations in multiagent systems
OpenAI researchers propose a general framework for learning representations of agent policies in multiagent systems using minimal interaction data, casting the problem as representation learning with applications to competitive control and cooperative communication environments.
Learning to model other minds
OpenAI and University of Oxford researchers present LOLA (Learning with Opponent-Learning Awareness), a reinforcement learning method that enables agents to model and account for the learning of other agents, discovering cooperative strategies in multi-agent games like the iterated prisoner's dilemma and coin game.
Learning with opponent-learning awareness
OpenAI presents LOLA (Learning with Opponent-Learning Awareness), a multi-agent reinforcement learning method where agents shape the anticipated learning of other agents. The approach demonstrates emergence of cooperation in iterated prisoner's dilemma and convergence to Nash equilibrium in game-theoretic settings.
Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics
This paper presents a distributed approach for constrained multi-agent reinforcement learning that uses state-augmented policy learning and neighbor-to-neighbor consensus over dual variables to satisfy global resource constraints while scaling linearly with the number of agents. Experiments on smart grid demand response demonstrate that consensus coordination is essential for feasibility, scaling to thousands of agents unlike centralized training approaches.