Advancing science and math with GPT-5.2

OpenAI Blog Models

Summary

OpenAI releases GPT-5.2, featuring GPT-5.2 Pro and GPT-5.2 Thinking variants optimized for scientific and mathematical work. The models achieve state-of-the-art performance on benchmarks like GPQA Diamond (93.2%) and FrontierMath (40.3%), demonstrating improved reasoning capabilities designed to accelerate scientific research across physics, chemistry, biology, and mathematics.

GPT-5.2 is OpenAI’s strongest model yet for math and science, setting new state-of-the-art results on benchmarks like GPQA Diamond and FrontierMath. This post shows how those gains translate into real research progress, including solving an open theoretical problem and generating reliable mathematical proofs.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 02:50 PM

# Advancing science and math with GPT-5.2 Source: [https://openai.com/index/gpt-5-2-for-science-and-math/](https://openai.com/index/gpt-5-2-for-science-and-math/) One of our hopes for strong AI is that it will accelerate scientific research for the benefit of everyone, helping researchers explore more ideas, test them faster, and turn discoveries into impact\. Over the past year, we’ve been working closely with scientists across math, physics, biology, and computer science to understand where AI can help—and where it still falls short\. Last month, we[published a paper⁠](https://openai.com/index/accelerating-science-gpt-5/)that compiles early case studies across math, physics, biology, computer science, astronomy, and materials science in which GPT‑5 helped researchers showing how GPT‑5 has already begun contributing to real scientific work\.With[GPT‑5\.2](https://openai.com/index/introducing-gpt-5-2/), we’re starting to see those gains become more consistent and more reliable\. GPT‑5\.2 Pro and GPT‑5\.2 Thinking are our strongest models yet for scientific and mathematical work\. Strong mathematical reasoning is a foundation for reliability in scientific and technical work\. It enables models to follow multi\-step logic, keep quantities consistent, and avoid subtle errors that can compound in real analyses—from simulations and statistics to forecasting and modeling\. Improvements on benchmarks like FrontierMath reflect not a narrow skill, but stronger general reasoning and abstraction, capabilities that carry directly into scientific workflows such as coding, data analysis, and experimental design\. These capabilities are also closely tied to progress toward general intelligence\. A system that can reliably reason through abstraction, maintain consistency across long chains of thought, and generalize across domains is exhibiting traits that are foundational to AGI—not task\-specific tricks, but broad, transferable reasoning skills that matter across science, engineering, and real\-world decision\-making\. We believe GPT‑5\.2 Pro and GPT‑5\.2 Thinking are the world’s best models for assisting and accelerating scientists\. On**GPQA Diamond,**a graduate\-level Google\-proof Q&A benchmark, GPT‑5\.2 Pro achieves 93\.2%, followed closely by GPT‑5\.2 Thinking at 92\.4%\. In[GPQA Diamond⁠\(opens in a new window\)](https://arxiv.org/abs/2311.12022),models answer multiple choice questions about physics, chemistry, and biology\. No tools were enabled and reasoning effort was set to maximum\. On**FrontierMath \(Tier 1–3\),**an evaluation of expert\-level mathematics, GPT‑5\.2 Thinking set a new state of the art, solving 40\.3% of problems\. This result suggests a useful direction for how AI systems can support scientific research, particularly in domains with axiomatic theoretical foundations such as mathematics and theoretical computer science\. In settings like these, frontier models can help explore proofs, test hypotheses, and identify connections that might otherwise take substantial human effort to uncover\. At the same time, these systems are not independent researchers\. Expert judgment, verification, and domain understanding remain essential\. Even highly capable models can make mistakes or rely on unstated assumptions\. But they can also produce detailed, structured arguments that merit careful human study and refinement\. Making reliable progress with AI therefore depends on workflows that keep validation, transparency, and collaboration firmly in the loop\. Viewed as a case study, this result illustrates an emerging mode of research practice\. Models like GPT‑5\.2 can serve as tools for supporting mathematical reasoning and accelerating early\-stage exploration, while responsibility for correctness, interpretation, and context remains with human researchers\. Used carefully, such systems may help streamline significant aspects of theoretical work without displacing the central role of human judgment in scientific inquiry\.

Similar Articles

Introducing GPT-5.2

OpenAI Blog

OpenAI introduces GPT-5.2, the most capable model series yet, with significant improvements in knowledge work, code generation, image perception, long-context understanding, and tool-calling. The GPT-5.2 Thinking variant achieves state-of-the-art performance on professional benchmarks, outperforming human experts on 70.9% of GDPval tasks across 44 occupations.

Introducing GPT-5

OpenAI Blog

OpenAI introduces GPT-5, a significant leap in AI intelligence featuring state-of-the-art performance across coding, math, writing, health, and visual perception. The unified system includes a smart efficient model, a deeper reasoning model (GPT-5 thinking), and a real-time router for optimal response selection.

Introducing GPT-5.1 for developers

OpenAI Blog

OpenAI releases GPT-5.1, a new model in the GPT-5 series that dynamically adapts thinking time based on task complexity, offering 2-3x faster performance than GPT-5 while maintaining frontier intelligence. The release includes extended prompt caching (24-hour retention), new coding tools (apply_patch and shell), and a 'no reasoning' mode for latency-sensitive applications.

Introducing GPT-5.4

OpenAI Blog

OpenAI is releasing GPT-5.4 and GPT-5.4 Pro across ChatGPT, the API, and Codex, featuring native computer-use capabilities, 1M token context, improved reasoning and coding, and state-of-the-art performance on professional knowledge work benchmarks. It is described as OpenAI's most capable and token-efficient reasoning model to date.

Introducing GPT-5 for developers

OpenAI Blog

OpenAI releases GPT-5 in their API platform, a state-of-the-art model achieving 74.9% on SWE-bench Verified and excelling at coding, agentic tasks, and long-context reasoning. The release includes three model sizes (gpt-5, gpt-5-mini, gpt-5-nano) and new API features like verbosity control, minimal reasoning mode, and custom tools.