Why ChatGPT can't crack coding problems after 2021
A study published in IEEE Transactions on Software Engineering has evaluated the code generated by OpenAI's ChatGPT. It found that ChatGPT's performance dropped when solving coding problems post-2021, possibly due to unfamiliarity with these problems in the training dataset. Notably, the study revealed a wide range of success rates for ChatGPT's ability to produce functional code, from as high as 89% to as low as 0.66%. Factors like task difficulty and programming language used also influenced results.
Limitations in code generation
The research team tested GPT-3.5's ability to solve 728 coding problems from the LeetCode testing platform, in five programming languages. Tang noted that ChatGPT demonstrated proficiency, especially with problems existing on LeetCode before 2021. Yutian Tang, a lecturer at the University of Glasgow involved in the study, emphasized understanding ChatGPT's strengths and limitations for improved generation techniques.
ChatGPT's efficiency and error correction capabilities
Interestingly, the study found that ChatGPT generated code with lesser runtime and memory overheads than at least 50% of human solutions to the same problems. However, Tang noted that when it came to correcting its own mistakes, ChatGPT was less successful. He explained that "ChatGPT may generate incorrect code because it does not understand the meaning of algorithm problems, thus, this simple error feedback information is not enough."
Security concerns and complexity in AI-generated code
The study also highlighted certain security concerns with AI-generated code. The researchers found that ChatGPT-generated code had a fair amount of vulnerabilities, like a missing null test, but many of these were easily fixable. Tang noted that the generated code in C was the most complex, followed by C++ and Python, which had similar complexity to human-written code.
Recommendations for developers using ChatGPT
Tang suggested that developers using ChatGPT should offer additional information, to help the AI better understand problems and avoid possible vulnerabilities. He advised, "When encountering more complex programming problems, developers can provide relevant knowledge as much as possible, and tell ChatGPT in the prompt which potential vulnerabilities to be aware of." This guidance aims to enhance the functionality and security of AI-generated code.