Learning to cooperate, compete, and communicate

OpenAI Blog Papers

Summary

OpenAI presents research on multi-agent reinforcement learning environments where agents learn to cooperate, compete, and communicate. The paper introduces MADDPG (Multi-Agent DDPG), a centralized critic approach that enables agents to learn collaborative strategies and communication protocols more effectively than traditional decentralized methods.

Multiagent environments where agents compete for resources are stepping stones on the path to AGI. Multiagent environments have two useful properties: first, there is a natural curriculum—the difficulty of the environment is determined by the skill of your competitors (and if you’re competing against clones of yourself, the environment exactly matches your skill level). Second, a multiagent environment has no stable equilibrium: no matter how smart an agent is, there’s always pressure to get smarter. These environments have a very different feel from traditional environments, and it’ll take a lot more research before we become good at them.
Original Article
View Cached Full Text

Cached at: 04/20/26, 02:45 PM

# Learning to cooperate, compete, and communicate Source: [https://openai.com/index/learning-to-cooperate-compete-and-communicate/](https://openai.com/index/learning-to-cooperate-compete-and-communicate/) Multiagent environments where agents compete for resources are stepping stones on the path to AGI\. Multiagent environments have two useful properties: first, there is a natural curriculum—the difficulty of the environment is determined by the skill of your competitors \(and if you’re competing against clones of yourself, the environment exactly matches your skill level\)\. Second, a multiagent environment has no stable equilibrium: no matter how smart an agent is, there’s always pressure to get smarter\. These environments have a very different feel from traditional environments, and it’ll take a lot more research before we become good at them\. Traditional decentralized RL approaches—DDPG, actor\-critic learning, deep Q\-learning, and so on—struggle to learn in multiagent environments, as at every time step each agent will be trying to learn to predict the actions of other agents while also taking its own actions\. This is especially true in competitive situations\. MADDPG employs a centralized critic to supply agents with information about their peers’ observations and potential actions, transforming an unpredictable environment into a predictable one\. Using policy gradient methods presents further challenges: because these exhibit high variance learning the right policy is difficult to do when the reward is inconsistent\. We also found that adding in a critic, while improving stability, still failed to solve several of our environments such as cooperative communication\. It seems that considering the actions of others during training is important for learning collaborative strategies\. Before we developed MADDPG, when using decentralized techniques, we noticed that listener agents would often learn to ignore the speaker if it sent inconsistent messages about where to go to\. The agent would then set all the weights associated with the speaker’s message to 0, effectively deafening itself\. Once this happens, it’s hard for training to recover, since the speaker will never know if it says the right thing due to the absence of any feedback\. To fix this, we looked at a technique outlined in[a recent hierarchical reinforcement project⁠\(opens in a new window\)](https://arxiv.org/abs/1703.01161), which lets us force the listener to incorporate the utterances of the speaker in its decision\-making process\. This fix didn’t work, because though it forces the listener to pay attention to the speaker, it doesn’t help the speaker figure out what to say that is relevant\. Our centralized critic method helps deal with these challenges, by helping the speaker to learn which utterances might be relevant to the actions of other agents\. For more of our results, you can watch the following video:

Similar Articles

Learning to communicate

OpenAI Blog

OpenAI researchers demonstrate that cooperative AI agents can develop their own grounded and compositional language through reinforcement learning in simple worlds. The agents learn to communicate by being rewarded for achieving goals that require coordination, creating shared symbolic languages to coordinate behavior.

Learning policy representations in multiagent systems

OpenAI Blog

OpenAI researchers propose a general framework for learning representations of agent policies in multiagent systems using minimal interaction data, casting the problem as representation learning with applications to competitive control and cooperative communication environments.

Learning to model other minds

OpenAI Blog

OpenAI and University of Oxford researchers present LOLA (Learning with Opponent-Learning Awareness), a reinforcement learning method that enables agents to model and account for the learning of other agents, discovering cooperative strategies in multi-agent games like the iterated prisoner's dilemma and coin game.

Learning with opponent-learning awareness

OpenAI Blog

OpenAI presents LOLA (Learning with Opponent-Learning Awareness), a multi-agent reinforcement learning method where agents shape the anticipated learning of other agents. The approach demonstrates emergence of cooperation in iterated prisoner's dilemma and convergence to Nash equilibrium in game-theoretic settings.

Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics

arXiv cs.LG

This paper presents a distributed approach for constrained multi-agent reinforcement learning that uses state-augmented policy learning and neighbor-to-neighbor consensus over dual variables to satisfy global resource constraints while scaling linearly with the number of agents. Experiments on smart grid demand response demonstrate that consensus coordination is essential for feasibility, scaling to thousands of agents unlike centralized training approaches.