Summarize

This Chinese LLM just beat OpenAI's leading reasoning model

By Mudit Dube

Jan 21, 2025

10:16 am

What's the story

Chinese artificial intelligence (AI) firm, DeepSeek, has launched an open version of its reasoning model, DeepSeek-R1. The advanced model is said to match and even outperform OpenAI's reasoning model, o1, on some AI benchmarks. The R1 model is now available through the AI development platform Hugging Face under an MIT license, which means you can use it commercially without any restrictions.

Benchmark performance

DeepSeek-R1 excels in AIME, MATH-500, and SWE-bench verified

DeepSeek claims its R1 model beats o1 on the AIME, MATH-500, and SWE-bench Verified benchmarks. While the AIME benchmark employs other models to evaluate a model's performance, MATH-500 includes word problems. SWE-bench Verified, on the other hand, focuses on programming tasks. Being a reasoning model, R1 has the distinct capability to self-validate its results, preventing it from common pitfalls that often trip other models.

Model reliability

Reasoning models offer reliability in complex domains

Reasoning models such as R1 may take a bit longer to arrive at solutions than non-reasoning ones, but they are more reliable. This is especially true in complex domains like physics, science, and math. As per a technical report by DeepSeek, the R1 model has an astounding 671 billion parameters. These parameters indicate a model's problem-solving capabilities with more parameters performing better than less.

Model accessibility

DeepSeek offers 'distilled' versions of R1 model

Along with the full version, DeepSeek has also launched "distilled" versions of the R1 model. They range from 1.5 billion parameters to 70 billion parameters, with the smallest one being capable of running on a laptop. The full R1 model requires more powerful hardware but can be accessed through DeepSeek's API at prices much lower than OpenAI's o1.

Model constraints

R1 model's limitations and regulatory compliance

Despite its impressive performance, the R1 model has certain limitations owing to its Chinese origin. It is under the watch of China's internet regulator to ensure that its responses conform with "core socialist values." As such, the R1 model doesn't answer sensitive topics like the Tiananmen Square incident or Taiwan's independence. This is a common practice among many Chinese AI systems that dodge subjects potentially controversial for the country's regulators.