
DeepSeek reveals new AI reasoning technique amid next-gen model anticipation
What's the story
Chinese AI start-up, DeepSeek, has unveiled a novel way to improve the reasoning capabilities of large language models (LLMs). The announcement comes ahead of the debut of the company's next-generation model.
The novel technique, developed with Tsinghua University researchers, combines generative reward modeling (GRM) with self-principled critique tuning.
According to a paper published on Friday, the dual approach seeks to make LLMs respond to general queries better and faster.
Performance
DeepSeek-GRM models outperform existing methods
The newly developed DeepSeek-GRM models have achieved competitive performance compared to existing methods.
The researchers said that these models have "achieved competitive performance" with robust public reward models.
Reward modeling is a technique that guides an LLM toward human preferences, and its successful application in the DeepSeek-GRM models contributes to improved AI reasoning capabilities.
Open-source
Plans to make GRM models open source
DeepSeek has said that it plans to open source the GRM models, but there is no timeline yet.
The announcement was made in an academic paper released on arXiv, an online scientific paper repository.
The release of this research comes amid speculation of DeepSeek's next move after the global attention its V3 foundation model and R1 reasoning model received.
Anticipation
DeepSeek-R2 release anticipated
According to Reuters, DeepSeek-R2, the successor of the R1 model, could be released as early as this month.
The speculation comes as the company looks to capitalize on its growing reputation in the tech industry.
The launch of DeepSeek-R1 generated a lot of interest with its cost-effective performance matching leading models.
However, DeepSeek has neither confirmed nor denied these reports of an R2 release.