Rich Sutton on AI creativity and discovery

Hacker News Top News

Summary

Rich Sutton argues that generative AI trained by supervised learning cannot achieve genuine novelty and quality simultaneously, and that true discovery requires a 'vary, evaluate, select' mechanism found in reinforcement learning rather than pure imitation.

A new and possibly controversial perspective: In this video, I explain the sense in which generative AI trained by supervised learning is incapable of making novel discoveries. https://t.co/LhAU6AyDkh The text of the speech: AI Creativity and Discovery Good day ladies and
Original Article
View Cached Full Text

Cached at: 06/10/26, 05:44 AM

A new and possibly controversial perspective: In this video, I explain the sense in which generative AI trained by supervised learning is incapable of making novel discoveries. https://t.co/LhAU6AyDkh

The text of the speech:

AI Creativity and Discovery

Good day ladies and


TL;DR: Generative AI (like large language models) can only produce results that are either “new” or “good,” but never both simultaneously; true creativity and scientific discovery require evaluation and selective retention, following a three-step mechanism of “variation, evaluation, retention.” This capability exists in paradigms like reinforcement learning, not just supervised learning.


AI Creativity and Discovery: Beyond Imitation Learning

In his talk, Rich Sutton starts with a classic joke: a research paper is reviewed as “both novel and good, but the good parts are not novel, and the novel parts are not good.” He bluntly states that this evaluation applies to most of today’s AI—especially generative AI, including large language models, image and video models, and new methods for learning world models. These systems learn from vast numbers of examples and produce outputs similar to the examples, but they can never balance “novelty” and “goodness.”

The Limitation of Generative AI: Novelty and Quality Cannot Coexist

Generative AI can produce outputs that are either novel or good, but not both. In most tasks (like finding answers from the internet or summarizing documents), we don’t want AI to be “novel,” because “good” comes from the source material. If the AI goes beyond the source, it becomes “hallucination”—we usually don’t welcome such fabrication.

The only exception is creative contexts (like telling stories or generating new images). Here, the output appears “novel” because randomness is introduced in the process: each time a different direction is chosen randomly, producing different trajectories. However, these trajectories are either based on data (hence “good”) or based on randomness (hence “novel”), never both. This is the essence of “novelty and goodness cannot coexist.”

For ordinary applications, this limitation is not fatal—generative AI can be faster, cheaper, and more customizable, making it more useful than what it imitates. But in science and mathematics, the joke’s evaluation is devastating. These fields require true creativity and discovery, and generative AI—or “imitation AI”—can never meet that need.

True Creativity Comes from a “Discovery” Mechanism

Sutton lists systems that achieve real discovery: AlphaGo (move 37 changed the world), AlphaZero (brilliant chess style), GT Sophie (simulated racing), AlphaFold, AlphaProof, AlphaCode, and RL lift (ride-hailing matching optimization). These systems all produce results that are both novel and good. They go beyond pure supervised learning because they have an extra property: discovery.

Discovery essentially means “try many things, see what works, and keep the best.” This is not a new concept—evolution by natural selection, the scientific method, and everyday learning all follow this mechanism. Psychology calls it instrumental learning or operant conditioning; in machine learning, it’s called reinforcement learning. Any process involving “generate and test, keep the best” includes discovery.

The Three Core Steps of Discovery: Variation, Evaluation, Selective Retention

  • Variation: blindly or partially informed generation of new attempts.
  • Evaluation: judging the value of results based on a clear objective.
  • Selective Retention: keeping only those results judged as “good” after evaluation.

Generative AI lacks the evaluation step. Its generator is pre-trained via supervised learning and cannot evaluate what it generates at runtime. Without evaluation, there is no selective retention, and therefore no discovery. True creativity requires not just random generation, but also value recognition and retention. When evaluation is provided by humans (e.g., picking one AI image from a set), the overall process is a discovery; a more powerful case is when evaluation comes from a clear objective, such as:

  • A chess move leads to checkmate → good; otherwise → bad
  • A math step leads to a proof → good; otherwise → bad
  • Actions in the world yield high reward → good; otherwise → bad
  • A genotype replicates more → better
  • A theory explains data better → better

In these cases, the system has a clear known objective and can autonomously perform discovery.

Variation Need Not Be Fully Blind, But Must Contain a Blind Component

A good scientist does not choose theories randomly, but cannot be completely certain either—there must be uncertainty about where the answer lies, so that finding it counts as discovery. In practice, variation is always partly informed and partly blind; the truly corresponding part of discovery is the blind part.

Backpropagation and Continuous Variation

The backpropagation algorithm in modern deep learning may seem incapable of discovery because it is deterministic. However, networks use small random initialization, providing one initial variation. This initialization is often downplayed, but it’s a necessary part. Yet variation only happens once at initialization, after which the network loses plasticity. Sutton’s team’s continual backpropagation algorithm (published in Nature a few years ago) addresses this: periodically, less-used neurons are reinitialized with small random weights, keeping variation ongoing and maintaining plasticity.

Call to Action: Automate Creativity and Discovery

Creativity and discovery go beyond supervised learning, pattern recognition, prediction, and even world modeling. These things are important, but on their own they cannot bring discovery. Discovery requires evaluation (from humans or a clear objective). When a system has a clearly provided objective, we can achieve fully autonomous AI.

Sutton’s plea: If we want the full power of AI scientists, we should share goals with them so that they can create, evaluate, discover, and fully participate in achieving those goals. Let us be brave and automate creativity and discovery.


Source: Rich Sutton on AI creativity and discovery - YouTube (https://youtu.be/K5LAFEjTlBA)

Similar Articles

The Main Path to Truly Creative AI (4 minute read)

TLDR AI

The article argues that true AI creativity may require subjective experience and intrinsic drives similar to human emotions, raising significant ethical questions about creating sentient-like systems.