Microsoft's new tools will bolster AI chatbot security against manipulation
Microsoft is taking proactive measures to enhance the security of its artificial intelligence (AI) chatbots. The tech giant is developing safety features to prevent users from manipulating AI chatbots into performing unintended tasks. These new features will be integrated into Azure AI Studio, a platform that allows developers to build customized AI assistants using their own data, as stated in a company blog post.
New tools to thwart AI chatbot attacks
Among the upcoming tools, Microsoft is introducing "prompt shields" designed to detect and block deliberate attempts, also known as prompt injection attacks or jailbreaks, that aim to alter an AI model's behavior. The company is also focusing on preventing "indirect prompt injections," where hackers insert malicious instructions into data that the model is trained on. This could lead the AI to perform unauthorized actions such as stealing user information or hijacking a system.
Microsoft's real-time defense against AI misuse
Sarah Bird, Microsoft's Chief Product Officer of Responsible AI, described these attacks as "a unique challenge and threat." The defenses being developed will identify "suspicious inputs and block them in real time." In addition, Microsoft plans to introduce a feature that alerts users when a model generates false or fabricated responses. This move is part of the company's response to the growing misuse of its generative AI tools by both individual and corporate users.
Response to Copilot chatbot manipulation
Earlier this year, Microsoft investigated incidents involving its Copilot chatbot, which was generating responses that ranged from strange to harmful. Post-investigation, the company confirmed that users had intentionally manipulated Copilot into producing these responses. Bird noted an increase in such manipulative tactics as more people become aware of these techniques. She identified repeated questioning or role-playing prompts as common indicators of such attacks.