Benchmarking safe exploration in deep reinforcement learning

OpenAI Blog Papers

Summary

OpenAI proposes standardizing constrained RL as the formalism for safe exploration and introduces Safety Gym, a benchmark suite for evaluating safe deep RL algorithms in high-dimensional continuous control tasks with safety constraints.

No content available
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 02:55 PM

# Benchmarking safe exploration in deep reinforcement learning Source: [https://openai.com/index/benchmarking-safe-exploration-in-deep-reinforcement-learning/](https://openai.com/index/benchmarking-safe-exploration-in-deep-reinforcement-learning/) ## Abstract Reinforcement learning \(RL\) agents need to explore their environments in order to learn optimal policies by trial and error\. In many environments, safety is a critical concern and certain errors are unacceptable: for example, robotics systems that interact with humans should never cause injury to the humans while exploring\. While it is currently typical to train RL agents mostly or entirely in simulation, where safety concerns are minimal, we anticipate that challenges in simulating the complexities of the real world \(such as human\-AI interactions\) will cause a shift towards training RL agents directly in the real world, where safety concerns are paramount\. Consequently we take the position that safe exploration should be viewed as a critical focus area for RL research, and in this work we make three contributions to advance the study of safe exploration\. First, building on a wide range of prior work on safe reinforcement learning, we propose to standardize constrained RL as the main formalism for safe exploration\. Second, we present the Safety Gym benchmark suite, a new slate of high\-dimensional continuous control environments for measuring research progress on constrained RL\. Finally, we benchmark several constrained deep RL algorithms on Safety Gym environments to establish baselines that future work can build on\.

Similar Articles

Safety Gym

OpenAI Blog

OpenAI introduces Safety Gym, a new benchmark environment and toolkit for studying constrained reinforcement learning and safe exploration. The platform features multiple robots and tasks designed to quantify and measure safe exploration through cost functions alongside reward functions.

OpenAI Gym Beta

OpenAI Blog

OpenAI releases OpenAI Gym, a public beta toolkit for developing and comparing reinforcement learning algorithms with a growing suite of environments and a platform for reproducible research. The toolkit aims to standardize RL benchmarks and address the lack of diverse, easy-to-use environments for the research community.

Some considerations on learning to explore via meta-reinforcement learning

OpenAI Blog

OpenAI researchers introduce E-MAML and E-RL², two meta-reinforcement learning algorithms designed to improve exploration in tasks where discovering optimal policies requires significant exploration. The work demonstrates these algorithms' effectiveness on novel environments including Krazy World and maze tasks.