policy-gradient

#policy-gradient

Variance reduction for policy gradient with action-dependent factorized baselines

OpenAI Blog ↗ · 2018-03-20 Cached

OpenAI researchers derive a bias-free action-dependent baseline for variance reduction in policy gradient methods, demonstrating improved learning efficiency on high-dimensional control tasks, multi-agent, and partially observed environments.

0 favorites 0 likes

#policy-gradient

Learning with opponent-learning awareness

OpenAI Blog ↗ · 2017-09-13 Cached

OpenAI presents LOLA (Learning with Opponent-Learning Awareness), a multi-agent reinforcement learning method where agents shape the anticipated learning of other agents. The approach demonstrates emergence of cooperation in iterated prisoner's dilemma and convergence to Nash equilibrium in game-theoretic settings.

0 favorites 0 likes

#policy-gradient

Proximal Policy Optimization

OpenAI Blog ↗ · 2017-07-20 Cached

OpenAI introduces Proximal Policy Optimization (PPO), a reinforcement learning algorithm that matches or outperforms state-of-the-art methods while being simpler to implement and tune. PPO uses a novel clipped objective function to constrain policy updates and has since become OpenAI's default RL algorithm.

0 favorites 0 likes

policy-gradient

Variance reduction for policy gradient with action-dependent factorized baselines

Learning with opponent-learning awareness

Proximal Policy Optimization

Submit Feedback