Learning a hierarchy

OpenAI Blog 10/26/17, 07:00 AM Papers

Summary

OpenAI research proposes hierarchical reinforcement learning where agents break down complex tasks into sequences of high-level actions rather than low-level ones, significantly improving efficiency for long-horizon tasks by reducing search complexity from thousands of steps to dozens.

We’ve developed a hierarchical reinforcement learning algorithm that learns high-level actions useful for solving a range of tasks, allowing fast solving of tasks requiring thousands of timesteps. Our algorithm, when applied to a set of navigation problems, discovers a set of high-level actions for walking and crawling in different directions, which enables the agent to master new navigation tasks quickly.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/20/26, 02:45 PM

# Learning a hierarchy Source: [https://openai.com/index/learning-a-hierarchy/](https://openai.com/index/learning-a-hierarchy/) Humans solve complicated challenges by breaking them up into small, manageable components\. Grilling pancakes consists of a series of high\-level actions, such as measuring flour, whisking eggs, transferring the mixture to the pan, turning the stove on, and so on\. Humans are able to learn new tasks rapidly by sequencing together these learned components, even though the task might take millions of low\-level actions, i\.e\., individual muscle contractions\. On the other hand, today’s reinforcement learning methods operate through brute force search over low\-level actions, requiring an enormous number of attempts to solve a new task\. These methods become very inefficient at solving tasks that take a large number of timesteps\. Our solution is based on the idea of hierarchical reinforcement learning, where agents represent complicated behaviors as a short sequence of high\-level actions\. This lets our agents solve much harder tasks: while the solution might require 2000 low\-level actions, the hierarchical policy turns this into a sequence of 10 high\-level actions, and it’s much more efficient to search over the 10\-step sequence than the 2000\-step sequence\.

Learning a hierarchy

Similar Articles

Stochastic Neural Networks for hierarchical reinforcement learning

Improving instruction hierarchy in frontier LLMs

Learning complex goals with iterated amplification

UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards

Generalizing from simulation

Submit Feedback

Similar Articles

Stochastic Neural Networks for hierarchical reinforcement learning

Improving instruction hierarchy in frontier LLMs

Learning complex goals with iterated amplification

UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards