The creator of ChatGPT, OpenAI, launched a new line of generative artificial intelligence (AI) models capable of reasoning and answering particularly complex questions, such as those in mathematics.
Unlike their predecessors, these new models have been designed to refine their thought processes, test different methods, and recognize errors before providing a final response.
A new paradigm
The CEO of OpenAI, Sam Altman, praised the models as “a new paradigm: an AI that can perform complex reasoning for general purposes.” However, he warned that the technology “still has flaws, is limited, and seems more impressive the first time you use it than after spending more time with it.”
OpenAI, backed by Microsoft, stated that in the trials, the models performed comparably to PhD students in challenging tasks in physics, chemistry, and biology.
They also excelled in mathematics and coding, achieving a success rate of 83% on a qualifying exam for the International Mathematical Olympiad, compared to 13% for GPT-4o, its most advanced general-use model. In a mathematics competition for American high school students, o1 placed “among the top 500,” he added.
Just like a human
“Just like a human who can think for a long time before answering a difficult question, it uses a chain of thought (…)” Learn to recognize and correct your mistakes. Learn to break down the most delicate stages into simpler ones. “Learn to try a different approach when the current one isn’t working,” explained OpenAI.
The company said that the enhanced reasoning capabilities could be used for healthcare researchers to annotate cell sequencing data, for physicists to generate complex formulas, and for computer developers to build and execute multi-step designs. He also indicated that the new models are better able to withstand attempts to bypass security mechanisms than the previous ones.
Enhanced security
OpenAI highlighted that its enhanced security measures included recent agreements with the AI Safety Institutes of the United States and the United Kingdom, which were granted early access to the models for evaluation.