DeepMind's AI system can outperform math olympiad gold medalists
What's the story
Google's artificial intelligence (AI) research lab, DeepMind, has an AI system called AlphaGeometry2.
The system has shown better problem-solving capabilities in geometry than the average gold medalist at an international mathematics competition.
In fact, according to a recent study by DeepMind researchers, AlphaGeometry2 can solve 84% of all geometry problems from the last 25 years of the International Mathematical Olympiad (IMO), a high school-level prestigious math contest.
Capabilities
AI's potential in solving complex geometry problems
DeepMind thinks that the secret to more capable AI could be in discovering new ways to tackle difficult geometry problems, particularly those involving Euclidean geometry.
Proving mathematical theorems involves reasoning as well as the capability to select from a set of possible steps toward a solution.
These problem-solving capabilities could prove to be a valuable element of future general-purpose AI models.
AI integration
Successful integration with other AI models
Last year, DeepMind demonstrated a system that combined AlphaGeometry2 with AlphaProof, an AI model for math reasoning.
The integrated system was able to solve four out of six problems from the 2024 IMO.
The success of this integration indicates that similar methods could be applied to other fields, perhaps even helping with complex engineering calculations.
Components
Understanding the core elements of AlphaGeometry2
AlphaGeometry2 revolves around a few key components, including a language model from Google's Gemini family of AI models and a "symbolic engine."
The Gemini model supports the symbolic engine, which employs mathematical rules to infer solutions to problems, in arriving at feasible proofs for a given geometry theorem.
Olympiad geometry problems are diagram-based and require "constructs" to be added before they can be solved.
AI training
Problem-solving process and training data
AlphaGeometry2's Gemini model predicts which constructs might be useful to add to a diagram, which the engine references to make deductions.
A search algorithm permits AlphaGeometry2 to conduct multiple searches for solutions in parallel and store possibly useful findings in a common knowledge base.
Due to the complexities of translating proofs into a format AI can understand, DeepMind created its own synthetic data to train AlphaGeometry2's language model, generating over 300 million theorems and proofs of varying complexity.
Performance
Performance and limitations
The DeepMind team picked 45 geometry problems from IMO competitions in the last 25 years and translated these into a bigger set of 50 problems.
AlphaGeometry2 solved 42 of the 50 problems, exceeding the average gold medalist score of 40.9.
However, it has its own limitations, including a technical quirk that stops it from solving problems with a variable number of points, nonlinear equations, and inequalities.