OpenAI Baselines: ACKTR & A2C

OpenAI Blog 08/18/17, 07:00 AM Tools

Summary

OpenAI releases ACKTR and A2C algorithms as part of its Baselines library, with ACKTR demonstrating improved sample complexity through natural gradient descent while maintaining computational efficiency comparable to first-order methods.

We’re releasing two new OpenAI Baselines implementations: ACKTR and A2C. A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C) which we’ve found gives equal performance. ACKTR is a more sample-efficient reinforcement learning algorithm than TRPO and A2C, and requires only slightly more computation than A2C per update.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/20/26, 02:56 PM

# OpenAI Baselines: ACKTR & A2C Source: [https://openai.com/index/openai-baselines-acktr-a2c/](https://openai.com/index/openai-baselines-acktr-a2c/) For machine learning algorithms, two costs are important to consider: sample complexity and computational complexity\. Sample complexity refers to the number of timesteps of interaction between the agent and its environment, and computational complexity refers to the amount of numerical operations that must be performed\. ACKTR has better sample complexity than first\-order methods such as A2C because it takes a step in the*natural gradient*direction, rather than the gradient direction \(or a rescaled version as in ADAM\)\. The natural gradient gives us the direction in parameter space that achieves the largest \(instantaneous\) improvement in the objective per unit of change in the output distribution of the network, as measured using the KL\-divergence\. By limiting the KL divergence, we ensure that the new policy does not behave radically differently than the old one, which could cause a collapse in performance\. As for computational complexity, the KFAC update used by ACKTR is only 10–25% more expensive per update step than a standard gradient update\. This contrasts with methods like TRPO \(i\.e, Hessian\-free optimization\), which requires a more expensive conjugate\-gradient computation\. In the following video you can see comparisons at different timesteps between agents trained with ACKTR to solve the game Q\-Bert and those trained with A2C\. ACKTR agents get higher scores than ones trained with A2C\.

OpenAI Baselines: ACKTR & A2C

Similar Articles

OpenAI Baselines: DQN

Variance reduction for policy gradient with action-dependent factorized baselines

Learning from human preferences

OpenAI Gym Beta

A new generation of AI models and one of the most powerful research papers out there.

Submit Feedback

Similar Articles

Variance reduction for policy gradient with action-dependent factorized baselines

Learning from human preferences

A new generation of AI models and one of the most powerful research papers out there.