OpenAI says its next big model can bring home Math Olympiad gold: A turning point? | Technology News


The value of AI for most users today lies in its ability to generate coherent, conversational language by applying probability theory to massive datasets. However, a future where AI models drive advances in fields like cryptography and space exploration by solving complex, multi-step mathematical problems, is now one step closer to reality.

OpenAI on Saturday, July 19, announced that its experimental AI reasoning model earned enough points on this year’s International Math Olympiad (IMO) to win a gold medal.

Started in 1959 in Romania, the IMO is widely considered to be one of the hardest, most prestigious math competitions in the world for high-school students. It is held over two days. Participants of the Olympiad take two exams, where they are expected to solve three math problems in each session within four-and-a-half hours.

Story continues below this ad

OpenAI’s unreleased AI model took the IMO 2025 under these same conditions with no access to the internet or external tools. It read the official math problem statements and generated natural language proofs. The model solved five out of a total of six problems, achieving a gold medal-worthy score of 35/42, according to Alexander Wei, a member of OpenAI’s technical staff.

“This underscores how fast AI has advanced in recent years. In 2021, my PhD advisor @JacobSteinhardthad me forecast AI math progress by July 2025. I predicted 30% on the MATH benchmark (and thought everyone else was too optimistic). Instead, we have IMO gold,” Wei wrote in a post on X.

This isn’t the first time a company has claimed that its AI model can match the performance of IMO gold medallists. Earlier this year, Google DeepMind introduced AlphaGeometry 2, a model specifically designed to solve complex geometry problems at a level comparable to a human Olympiad gold medallist.

However, the performance of OpenAI’s experimental model is seen as a step forward for general intelligence, not just task-specific AI systems. “We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling,” Wei said.

https://platform.twitter.com/widgets.js

Story continues below this ad

The model’s success marks progress beyond traditional reinforcement learning (RL), which is a process used to train AI models through a system of clear, verifiable rewards and penalties. Instead, the model possibly demonstrates more flexible, general problem-solving abilities as it “can craft intricate, watertight arguments at the level of human mathematicians.”

Wei also acknowledged that “IMO submissions are hard-to-verify, multi-page proofs.” Math proofs are made up of smaller, minor theorems called lemmas. OpenAI said that the AI-generated proofs to the problems were independently graded by three former IMO medalists, who finalised the model’s score unanimously.

https://platform.twitter.com/widgets.js

However, Gary Marcus, a professor at New York University (NYU) and well-known critic of OpenAI, pointed out that the results have not yet been independently verified by the organisers of the IMO.

OpenAI’s claims also come months after the US Defense Advanced Research Projects Agency DARPA launched a new initiative that looks to enlist researchers to find ways to conduct high-level mathematics research with an AI “co-author.” In the past, DARPA was responsible for driving research that led to the creation of ARPANET, the precursor to the internet.

Story continues below this ad

An AI model that could reliably check proofs would save enormous amounts of time for mathematicians and help them be more creative. While some of these models might seem equipped to solve complex problems, they could also be prone to stumbling on simple questions like whether 9.11 is bigger than 9.9. Hence, they are said to have ‘jagged intelligence’, which is a term coined by OpenAI co-founder Andrej Karpathy.

Reacting to the model’s gold medal-worthy IMO score, OpenAI CEO Sam Altman said, “This is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence.”

https://platform.twitter.com/widgets.js

However, the ChatGPT-maker does not plan on releasing the experimental research model at least for the next several months despite its math capabilities.





Source link

Leave a Reply