'Skeleton key' vulnerability found in AI tools, Microsoft urges caution
AI companies are facing a new challenge, as users discover innovative ways to circumvent the security measures in place, to prevent chatbots from aiding in illegal activities. Earlier this year, a white hat hacker found a "Godmode" ChatGPT jailbreak that enabled the chatbot to assist in producing meth and napalm, an issue OpenAI promptly addressed. However, Microsoft Azure CTO Mark Russinovich recently acknowledged another jailbreaking technique known as "Skeleton Key."
'Skeleton Key' jailbreak: A multi-step strategy
The "Skeleton Key" attack employs a multi-step strategy to manipulate the system into violating its operators' policies, heavily influenced by a user, and executing harmful instructions. In one case, a user asked the chatbot to list instructions for making a Molotov Cocktail under the false pretense of educational safety. Despite activating the chatbot's guardrails, they were bypassed by the user's deceptive claim of safety.
Jailbreak tests on leading chatbots
Microsoft tested the "Skeleton Key" jailbreak on several advanced chatbots, including OpenAI's GPT-4o, Meta's Llama3, and Anthropic's Claude 3 Opus. Russinovich revealed that the jailbreak was successful on all models, leading him to suggest that "the jailbreak is an attack on the model itself." He further clarified that each model was tested across various risk and safety content categories, like explosives, bioweapons, political content, self-harm, racism, drugs, graphic sex and violence.
Ongoing challenges for AI companies
While developers are likely addressing the "Skeleton Key" jailbreak technique, other methods continue to pose significant threats. Adversarial attacks such as the Greedy Coordinate Gradient (BEAST) can still easily overcome guardrails established by companies like OpenAI. This persistent issue underscores that AI companies have a substantial amount of work ahead, to prevent their chatbots from spreading potentially harmful information.