Learning complex goals with iterated amplification

OpenAI Blog Papers

Summary

OpenAI presents iterated amplification, a method for training AI systems on complex tasks by recursively decomposing them into smaller subtasks that humans can judge and solve, building up training signals from scratch through iterative composition.

We’re proposing an AI safety technique called iterated amplification that lets us specify complicated behaviors and goals that are beyond human scale, by demonstrating how to decompose a task into simpler sub-tasks, rather than by providing labeled data or a reward function. Although this idea is in its very early stages and we have only completed experiments on simple toy algorithmic domains, we’ve decided to present it in its preliminary state because we think it could prove to be a scalable approach to AI safety.
Original Article
View Cached Full Text

Cached at: 04/20/26, 02:46 PM

# Learning complex goals with iterated amplification Source: [https://openai.com/index/learning-complex-goals-with-iterated-amplification/](https://openai.com/index/learning-complex-goals-with-iterated-amplification/) Iterated amplification is a method for generating a training signal for the latter types of tasks, under certain assumptions\. Namely, although a human can’t perform or judge the whole task directly, we assume that a human can, given a piece of the task, identify clear smaller components of which it’s made up\. For example, in the networked computer example, a human could break down “defend a collection of servers and routers” into “consider attacks on the servers”, “consider attacks on the routers”, and “consider how the previous two attacks might interact”\. Additionally, we assume a human can do very small instances of the task, for example “identify if a specific line in a log file is suspicious”\. If these two things hold true, then we can build up a training signal for big tasks from human training signals for small tasks, using the human to coordinate their assembly\. In our implementation of amplification, we start by sampling small subtasks and training the AI system to do them by soliciting demonstrations from humans \(who can do these small tasks\)\. We then begin sampling slightly larger tasks, solving them by asking humans to break them up into small pieces, which AI systems trained from the previous step can now solve\. We use the solutions to these slightly harder tasks, which were obtained with human help, as a training signal to train AI systems to solve these second\-level tasks directly \(without human help\)\. We then continue to further composite tasks, iteratively building up a training signal as we go\. If the process works, the end result is a totally automated system that can solve highly composite tasks despite starting with no direct training signal for those tasks\. This process is somewhat similar to[expert iteration⁠\(opens in a new window\)](https://arxiv.org/pdf/1705.08439.pdf)\(the method used in[AlphaGo Zero⁠\(opens in a new window\)](https://www.nature.com/articles/nature24270)\), except that expert iteration reinforces an existing training signal, while iterated amplification builds up a training signal from scratch\. It also has features in common with[several⁠\(opens in a new window\)](https://arxiv.org/pdf/1807.04640.pdf)[recent⁠\(opens in a new window\)](https://people.eecs.berkeley.edu/~dawnsong/papers/iclr_2017_recursion.pdf)[learning algorithms⁠\(opens in a new window\)](https://arxiv.org/abs/1611.02401)that use problem decomposition on\-the\-fly to solve a problem at test time, but differs in that it operates in settings where there is no prior training signal\.

Similar Articles

Learning a hierarchy

OpenAI Blog

OpenAI research proposes hierarchical reinforcement learning where agents break down complex tasks into sequences of high-level actions rather than low-level ones, significantly improving efficiency for long-horizon tasks by reducing search complexity from thousands of steps to dozens.

Learning from human preferences

OpenAI Blog

OpenAI presents a method for training AI agents using human preference feedback, where an agent learns reward functions from human comparisons of behavior trajectories and uses reinforcement learning to optimize for the inferred goals. The approach demonstrates strong sample efficiency, requiring less than 1000 bits of human feedback to train an agent to perform a backflip.

Learning to communicate

OpenAI Blog

OpenAI researchers demonstrate that cooperative AI agents can develop their own grounded and compositional language through reinforcement learning in simple worlds. The agents learn to communicate by being rewarded for achieving goals that require coordination, creating shared symbolic languages to coordinate behavior.

Planning for AGI and beyond

OpenAI Blog

OpenAI outlines its strategy for preparing for AGI, emphasizing gradual deployment with real-world feedback loops, increasing caution as systems approach AGI capabilities, and development of better alignment techniques to ensure AI systems remain steerable and safe.