Summarize

'Mediocre but not incompetent': 'Mozart of math' reviews OpenAI's GPT-o1

By Dwaipayan Roy

Sep 16, 2024

07:12 pm

What's the story

OpenAI's latest generative AI model o1, codenamed "Strawberry," has been put to the test by Terence Tao, a renowned professor at the University of California, once described as the "Mozart of math."

Despite OpenAI's claim that this new model represents a significant advancement in AI capability and excels at complex reasoning tasks, Tao offered a less enthusiastic assessment.

He described his experience with o1 as "roughly on par with trying to advise a mediocre, but not completely incompetent graduate student."

AI assessment

Tao's evaluation of Strawberry's mathematical abilities

Tao evaluated Strawberry by assigning it a series of complex mathematical tasks.

He noted that while the model is certainly more capable than its predecessors, it still struggles with advanced research mathematical tasks.

In one instance, Tao presented the AI with a "vaguely worded mathematical query" that required identifying and applying a theorem from existing literature.

The model successfully identified Cramer's theorem and provided a satisfactory answer, an improvement over previous versions which often produced "hallucinated nonsense."

AI performance

Strawberry's performance on complex mathematical problems

When tasked with a more complex problem related to a bounded sequence power series, Strawberry's performance was less impressive.

Tao noted the model could arrive at a correct solution only if provided with numerous hints and prodding.

It failed to generate key conceptual ideas independently and made some non-trivial errors.

Despite these shortcomings, Tao believes that it may only take one or two further iterations of improved capability for the AI to reach the level of a 'competent graduate student.'

Apology issued

Tao's apology and clarification on AI vs human capabilities

Following his initial assessment, Tao issued an update to clarify his comparison of Strawberry's performance with that of human graduate students.

He apologized for potentially giving the impression that human graduates could be "reductively classified according to a static, one-dimensional level of 'competence.'"

Tao emphasized that unlike AI, humans have the ability to learn and grow during their studies.

He also highlighted other dimensions like creativity, independence, curiosity, exposition, intuition, professionalism, and social skills where humans can excel.