Summarize

Microsoft's new tools will bolster AI chatbot security against manipulation

By Pratyaksh Srivastava

Mar 29, 2024

01:50 pm

What's the story

Microsoft is taking proactive measures to enhance the security of its artificial intelligence (AI) chatbots. The tech giant is developing safety features to prevent users from manipulating AI chatbots into performing unintended tasks. These new features will be integrated into Azure AI Studio, a platform that allows developers to build customized AI assistants using their own data, as stated in a company blog post.

Prompt shields

New tools to thwart AI chatbot attacks

Among the upcoming tools, Microsoft is introducing "prompt shields" designed to detect and block deliberate attempts, also known as prompt injection attacks or jailbreaks, that aim to alter an AI model's behavior. The company is also focusing on preventing "indirect prompt injections," where hackers insert malicious instructions into data that the model is trained on. This could lead the AI to perform unauthorized actions such as stealing user information or hijacking a system.

Suspicious inputs

Microsoft's real-time defense against AI misuse

Sarah Bird, Microsoft's Chief Product Officer of Responsible AI, described these attacks as "a unique challenge and threat." The defenses being developed will identify "suspicious inputs and block them in real time." In addition, Microsoft plans to introduce a feature that alerts users when a model generates false or fabricated responses. This move is part of the company's response to the growing misuse of its generative AI tools by both individual and corporate users.

CoPilot investigation

Response to Copilot chatbot manipulation

Earlier this year, Microsoft investigated incidents involving its Copilot chatbot, which was generating responses that ranged from strange to harmful. Post-investigation, the company confirmed that users had intentionally manipulated Copilot into producing these responses. Bird noted an increase in such manipulative tactics as more people become aware of these techniques. She identified repeated questioning or role-playing prompts as common indicators of such attacks.