OpenAI Baselines: DQN

OpenAI Blog 05/24/17, 07:00 AM Tools

reinforcement-learning deep-learning open-source dqn debugging implementation

Summary

OpenAI shares lessons learned while implementing DQN as part of their Baselines project, covering debugging tips such as greyscale calibration issues, hyperparameter tuning, and correct interpretation of the Huber Loss in the original Nature paper.

We’re open-sourcing OpenAI Baselines, our internal effort to reproduce reinforcement learning algorithms with performance on par with published results. We’ll release the algorithms over upcoming months; today’s release includes DQN and three of its variants.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/20/26, 02:56 PM

# OpenAI Baselines: DQN Source: [https://openai.com/index/openai-baselines-dqn/](https://openai.com/index/openai-baselines-dqn/) When transforming the screen images into greyscale we had incorrectly calibrated our coefficients for the green color values, which led to the fish disappearing\. After we noticed the bug we tweaked the color values and our algorithm was able to see the fish again\. To debug issues like this in the future, Gym now contains a[play⁠\(opens in a new window\)](https://github.com/openai/gym/blob/master/gym/utils/play.py)function, which lets a researcher easily see the same observations as the AI agent would\. *Fix bugs, then hyperparameters*: After debugging, we started to calibrate our hyperparameters\. We ultimately found that setting the annealing schedule for epsilon, a hyperparameter which controlled the exploration rate, had a huge impact on performance\. Our final implementation decreases epsilon to 0\.1 over the first million steps and then down to 0\.01 over the next 24 million steps\. If our implementation contained bugs, then it’s likely we would come up with different hyperparameter settings to try to deal with faults we hadn’t yet diagnosed\. *Double check your interpretations of papers*: In the DQN[Nature⁠\(opens in a new window\)](https://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)paper the authors write: “We also found it helpful to clip the error term from the update \[\.\.\.\] to be between \-1 and 1\.”\. There are two ways to interpret this statement — clip the objective, or clip the multiplicative term when computing gradient\. The former seems more natural, but it causes the gradient to be zero on transitions with high error, which leads to suboptimal performance, as found in one[DQN implementation⁠\(opens in a new window\)](https://github.com/devsisters/DQN-tensorflow/issues/16)\. The latter is correct and has a simple mathematical interpretation —[Huber Loss⁠\(opens in a new window\)](https://en.wikipedia.org/wiki/Huber_loss)\. You can spot bugs like these by checking that the gradients appear as you expect — this can be easily done within TensorFlow by using[compute\_gradients⁠\(opens in a new window\)](https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer#compute_gradients)\. The majority of bugs in this post were spotted by going over the code multiple times and thinking through what could go wrong with each line\. Each bug seems obvious in hindsight, but even experienced researchers tend to underestimate how many passes over the code it can take to find all the bugs in an implementation\.

OpenAI Baselines: DQN

Similar Articles

OpenAI Baselines: ACKTR & A2C

Gotta Learn Fast: A new benchmark for generalization in RL

Generalizing from simulation

Benchmarking safe exploration in deep reinforcement learning

Variance reduction for policy gradient with action-dependent factorized baselines

Submit Feedback

Similar Articles

Gotta Learn Fast: A new benchmark for generalization in RL

Benchmarking safe exploration in deep reinforcement learning

Variance reduction for policy gradient with action-dependent factorized baselines