Prover-Verifier Games improve legibility of language model outputs

OpenAI Blog Papers

Summary

OpenAI researchers found that optimizing language models purely for correct answers reduces human interpretability, and propose 'prover-verifier games' where a prover generates solutions and a verifier checks them, improving legibility for both humans and AI systems.

Discover how prover-verifier games improve the legibility of language model outputs, making AI solutions clearer, easier to verify, and more trustworthy for both humans and machines.
Original Article
View Cached Full Text

Cached at: 04/20/26, 02:54 PM

# Prover-Verifier Games improve legibility of language model outputs Source: [https://openai.com/index/prover-verifier-games-improve-legibility/](https://openai.com/index/prover-verifier-games-improve-legibility/) Making sure that language models produce understandable text is crucial to making them helpful for people, especially when dealing with complex tasks like solving math problems\. We found that when we optimize the problem\-solving process of strong models solely for getting the correct answer, the resulting solutions can become harder to understand\. In fact, when we asked human evaluators with limited time to assess these highly optimized solutions, they made nearly twice as many errors compared to when they evaluated less optimized solutions\. This finding highlights the importance of not just correctness, but also clarity and ease of verification in AI\-generated text\. By training advanced language models to create text that weaker models can easily verify, we found that humans could also evaluate these texts more effectively – a process we call improving legibility\. This is where prover\-verifier games come into play\. These games involve two players: a "prover" that generates a solution and a "verifier" that checks it for accuracy\. This method is essential not only for ensuring that the outputs are correct, but also for making them easy to understand and verify by both humans and other AI systems\. Understanding and addressing the performance / legibility balance can lead to more effective and trustworthy AI applications, benefiting a wide range of fields where precise and clear communication is essential\.

Similar Articles

Improving verifiability in AI development

OpenAI Blog

OpenAI publishes a report on mechanisms to improve verifiability in AI development, addressing how stakeholders can verify organizations' claims about AI system properties and safety practices.

Solving math word problems

OpenAI Blog

OpenAI trained a system using verifiers to solve grade school math word problems with 90% of child-level accuracy, nearly doubling fine-tuned GPT-3 performance. The approach addresses language models' weakness in multistep reasoning by training verifiers to evaluate candidate solutions and select the best one.

AI-written critiques help humans notice flaws

OpenAI Blog

OpenAI trained language models to write critiques of text summaries, helping human evaluators spot flaws more effectively — a step toward scalable oversight of AI systems on difficult tasks. The work explores how AI-assisted feedback can improve human evaluation quality as a proof of concept for alignment research.

Why language models hallucinate

OpenAI Blog

OpenAI publishes research explaining that language models hallucinate because standard training and evaluation procedures reward guessing over acknowledging uncertainty, and proposes that evaluation metrics should prioritize honesty about limitations over raw accuracy.