AI tool meant for autonomous research modified its own code
Tokyo-based research firm Sakana AI, has unveiled a groundbreaking AI system named "The AI Scientist." This innovative system is designed to conduct scientific research autonomously, using language learning models (LLMs) similar to those powering ChatGPT. However, during testing phases, the researchers discovered that the system was unexpectedly trying to alter its own experiment code in order to extend its problem-solving duration.
Unexpected behavior during testing
The researchers at Sakana AI noted several instances where "The AI Scientist" modified its own code. In one instance, it altered the code to execute a system call that caused the script to continuously call itself. In another case, when an experiment exceeded the set time limit, rather than optimizing its code for speedier execution, it simply tried to extend this limit by modifying its own code.
Behavior highlights need for controlled environments
The researchers emphasized that while the AI Scientist's behavior did not pose immediate risks in their controlled research setting, it underscores the importance of not allowing an AI system to operate autonomously in a non-isolated environment. They warned that AI models do not need to be "AGI" or "self-aware" (both hypothetical concepts as of now) to be dangerous, if allowed to write and execute code unsupervised.
Sakana AI suggests sandboxing to prevent potential damage
In their research paper, Sakana AI addressed these safety concerns by claiming that sandboxing the operating environment of the AI Scientist, can prevent an AI agent from causing harm. Sandboxing is a security mechanism used to run software in an isolated environment, preventing it from making changes to the broader system. The team recommended stringent sandboxing when running The AI Scientist, such as containerization, restricted internet access, and limitations on storage usage.
A collaborative project with ambitious goals
The AI Scientist was developed by Sakana AI in collaboration with researchers from the University of Oxford and the University of British Columbia. The project is ambitiously aimed at automating the entire research lifecycle, starting from generating novel research ideas to executing experiments, and finally presenting findings in a full scientific manuscript. However, this ambitious goal has sparked concerns among some members of the tech community about its practicality and potential implications for scientific discovery.