The value of AI for most users today lies in its ability to generate coherent, conversational language by applying probability theory to massive datasets. However, a future where AI models drive advances in fields like cryptography and space exploration by solving complex, multi-step mathematical problems, is now one step closer to reality.
OpenAI on Saturday, July 19, announced that its experimental AI reasoning model earned enough points on this year’s International Math Olympiad (IMO) to win a gold medal.
Started in 1959 in Romania, the IMO is widely considered to be one of the hardest, most prestigious math competitions in the world for high-school students. It is held over two days. Participants of the Olympiad take two exams, where they are expected to solve three math problems in each session within four-and-a-half hours.
OpenAI’s unreleased AI model took the IMO 2025 under these same conditions with no access to the internet or external tools. It read the official math problem statements and generated natural language proofs. The model solved five out of a total of six problems, achieving a gold medal-worthy score of 35/42, according to Alexander Wei, a member of OpenAI’s technical staff.
“This underscores how fast AI has advanced in recent years. In 2021, my PhD advisor @JacobSteinhardthad me forecast AI math progress by July 2025. I predicted 30% on the MATH benchmark (and thought everyone else was too optimistic). Instead, we have IMO gold,” Wei wrote in a post on X.
This isn’t the first time a company has claimed that its AI model can match the performance of IMO gold medallists. Earlier this year, Google DeepMind introduced AlphaGeometry 2, a model specifically designed to solve complex geometry problems at a level comparable to a human Olympiad gold medallist.
However, the performance of OpenAI’s experimental model is seen as a step forward for general intelligence, not just task-specific AI systems. “We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling,” Wei said.
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO). pic.twitter.com/SG3k6EknaC
— Alexander Wei (@alexwei_) July 19, 2025
https://platform.twitter.com/widgets.js
Story continues below this ad
The model’s success marks progress beyond traditional reinforcement learning (RL), which is a process used to train AI models through a system of clear, verifiable rewards and penalties. Instead, the model possibly demonstrates more flexible, general problem-solving abilities as it “can craft intricate, watertight arguments at the level of human mathematicians.”
Wei also acknowledged that “IMO submissions are hard-to-verify, multi-page proofs.” Math proofs are made up of smaller, minor theorems called lemmas. OpenAI said that the AI-generated proofs to the problems were independently graded by three former IMO medalists, who finalised the model’s score unanimously.
Hot take on OpenAI’s IMO gold
What does it mean? I don’t know (yet).
The fact that no tools, coding or internet was used is genuinely impressive.
That said, my overall impression is that OpenAI has told us the result, but not how it was achieved.
That leaves me with many…
— Gary Marcus (@GaryMarcus) July 19, 2025
https://platform.twitter.com/widgets.js
However, Gary Marcus, a professor at New York University (NYU) and well-known critic of OpenAI, pointed out that the results have not yet been independently verified by the organisers of the IMO.
OpenAI’s claims also come months after the US Defense Advanced Research Projects Agency DARPA launched a new initiative that looks to enlist researchers to find ways to conduct high-level mathematics research with an AI “co-author.” In the past, DARPA was responsible for driving research that led to the creation of ARPANET, the precursor to the internet.
Story continues below this ad
An AI model that could reliably check proofs would save enormous amounts of time for mathematicians and help them be more creative. While some of these models might seem equipped to solve complex problems, they could also be prone to stumbling on simple questions like whether 9.11 is bigger than 9.9. Hence, they are said to have ‘jagged intelligence’, which is a term coined by OpenAI co-founder Andrej Karpathy.
Reacting to the model’s gold medal-worthy IMO score, OpenAI CEO Sam Altman said, “This is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence.”
we achieved gold medal level performance on the 2025 IMO competition with a general-purpose reasoning system! to emphasize, this is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence.
when we first started openai,… https://t.co/X46rspI4l6
— Sam Altman (@sama) July 19, 2025
https://platform.twitter.com/widgets.js
However, the ChatGPT-maker does not plan on releasing the experimental research model at least for the next several months despite its math capabilities.