How a reasoning model cracked an 80-year-old math problem — the OpenAI Podcast Ep. 20

YouTube AI Channels News

Summary

The OpenAI reasoning model successfully constructed a counterexample, disproving the 80-year-old Erdős unit distance conjecture, demonstrating the capability of general-purpose models to solve open math problems.

No content available
Original Article
View Cached Full Text

Cached at: 06/05/26, 07:33 AM

### TL;DR An OpenAI reasoning model (a subsequent version similar to o1) successfully constructed a counterexample that disproves the 80-year-old Erdős unit distance conjecture, showcasing the breakthrough ability of general-purpose models on open mathematical problems. ## Background and Team Introduction This episode of the OpenAI podcast features Alexander Wei, Hongxun Wu, and Lijie Chen from the reasoning research team. They share a recent achievement: the model solved the famous Erdős unit distance conjecture in combinatorial geometry. - **Lijie Chen**: Former assistant professor at Berkeley, joined OpenAI after seeing Alex's breakthroughs at IMO/IOI, focusing on reasoning. - **Alexander Wei**: PhD in machine learning, later researched test-time compute at OpenAI. - **Hongxun Wu**: Background in theoretical computer science, collaborated closely with Lijie at Berkeley, joined OpenAI to study the limits of reasoning models. ## Test-Time Compute: Letting Models "Think Longer" Traditional models give instant answers, while test-time compute allows models to try multiple approaches and self-correct before finalizing. Alex explains: > "Previously, models would answer immediately—shoot from the hip. Test-time compute now gives a model the chance to think, improve its answer, try different methods, and then produce the final output." This mechanism lets models solve problems that cannot be handled with a direct answer. ## From IMO Gold Medal to Open Problems: Surprisingly Fast Progress The team's initial goal was to have a model achieve a gold medal at the IMO (International Mathematical Olympiad). In late 2023, the model struggled at elementary school level, but by June 2024 it had earned an IMO gold medal. Alex recalls: > "I remember on my first day Nolan Brown asked me when the model would get an IMO gold. A lot of people thought 2026, but I felt maybe before April. Actually, a good model came in June... Looking back, IMO-level problems now seem like a rearview mirror for today's AI." Hongxun adds: > "When o1 was released, I told my advisor: 'The barrier for models solving math problems is gone.' He chuckled, knowing he was about to lose a student." ## Tackling the 80-Year-Old Erdős Unit Distance Conjecture ### What Is the Problem? The unit distance conjecture, posed by mathematician Paul Erdős, asks: given \(n\) points in the plane, what is the maximum number of pairs that are exactly 1 inch apart? Erdős conjectured that the optimal construction is a square grid (unit square lattice), yielding roughly \(O(n^{1+c/\log\log n})\) unit distances. ### The Model's Counterexample The OpenAI model found that the square grid is far from optimal and constructed a new geometric structure based on class field theory, yielding a better asymptotic result. This construction had never been proposed by humans before. Team members Alexander and Hongxun simultaneously hit enter, asking two different internal models the same question—both got similar correct answers. ### Verification Process The model first self-checked, then the team consulted company colleagues Mehtaab and Mark Sellke, who research mathematics. Initially they thought "this can't be true," but after a day of thinking without finding an error, their belief rose from 5% to 50%, and eventually they were convinced. Lijie describes: > "Nobody could sleep well because it was too exciting... This is a result publishable in top mathematics journals." ## Unexpected Abilities of a General-Purpose Model Notably, this model was not specifically trained for mathematics—it is a general reasoning model. The team was simply "taking the new model for a test drive" by challenging it with hard math problems. Hongxun says: > "I asked the model to do something and went to lunch. When I came back, it had done far better than I expected... This model is truly remarkable." ## How the Model Used External Resources While solving the problem, the model could browse the web and write and execute Python code, just like normal ChatGPT. It even did something amusing: Lijie mentions that the model's first action upon entering a website was to look up a dictionary to confirm the meaning of the word "unit" (a bit funny). ## Reactions to the Breakthrough - The academic community is very positive. Many TCS friends started asking the team about their own open problems, including Hongxun's advisor who gave two or three difficult ones. - The geometric figure constructed by the model is very symmetrical and beautiful; someone tried to sketch it. The team considers framing it and putting it on a desk or in the office as a memento. - Erdős once offered a $500 prize for this problem (mid-20th century), now possibly managed by a special fund. The team jokes: "Frame the check and put it in Sam's office." ## Proof of Reasoning Ability A figure in the official blog shows: giving the model more thinking time leads to faster growth in correctness. Alex summarizes: "More thinking leads to higher accuracy—that itself is proof of the effectiveness of reasoning." ## Outlook and Limitations Despite the astonishing progress on open problems, the team remains cautious about fundamental questions like P vs NP. Lijie believes: > "To solve P vs NP, you need a completely new theory—enough ideas to write many books. Right now it seems far off. But who knows what the future holds?" Hongxun is optimistic about the reasoning direction: "Not long ago, people said models were bad at math. Now a model is doing this. It proves that frontier AI can indeed produce results many human mathematicians would be proud of." ## Source OpenAI Podcast: How a reasoning model cracked an 80-year-old math problem (https://www.youtube.com/watch?v=wNWz5Hbh5VQ)

Similar Articles