Benchmarking safe exploration in deep reinforcement learning

OpenAI Blog 11/21/19, 08:00 AM Papers

Summary

OpenAI proposes standardizing constrained RL as the formalism for safe exploration and introduces Safety Gym, a benchmark suite for evaluating safe deep RL algorithms in high-dimensional continuous control tasks with safety constraints.

No content available

Original Article

View Cached Full Text

Cached at: 04/20/26, 02:55 PM

# Benchmarking safe exploration in deep reinforcement learning Source: [https://openai.com/index/benchmarking-safe-exploration-in-deep-reinforcement-learning/](https://openai.com/index/benchmarking-safe-exploration-in-deep-reinforcement-learning/) ## Abstract Reinforcement learning \(RL\) agents need to explore their environments in order to learn optimal policies by trial and error\. In many environments, safety is a critical concern and certain errors are unacceptable: for example, robotics systems that interact with humans should never cause injury to the humans while exploring\. While it is currently typical to train RL agents mostly or entirely in simulation, where safety concerns are minimal, we anticipate that challenges in simulating the complexities of the real world \(such as human\-AI interactions\) will cause a shift towards training RL agents directly in the real world, where safety concerns are paramount\. Consequently we take the position that safe exploration should be viewed as a critical focus area for RL research, and in this work we make three contributions to advance the study of safe exploration\. First, building on a wide range of prior work on safe reinforcement learning, we propose to standardize constrained RL as the main formalism for safe exploration\. Second, we present the Safety Gym benchmark suite, a new slate of high\-dimensional continuous control environments for measuring research progress on constrained RL\. Finally, we benchmark several constrained deep RL algorithms on Safety Gym environments to establish baselines that future work can build on\.

Benchmarking safe exploration in deep reinforcement learning

Similar Articles

Safety Gym

#Exploration: A study of count-based exploration for deep reinforcement learning

OpenAI Gym Beta

Gotta Learn Fast: A new benchmark for generalization in RL

Safe and Generalizable Hierarchical Multi-Agent RL via Constraint Manifold Control

Submit Feedback

Similar Articles

#Exploration: A study of count-based exploration for deep reinforcement learning

Gotta Learn Fast: A new benchmark for generalization in RL

Safe and Generalizable Hierarchical Multi-Agent RL via Constraint Manifold Control