Summarize

Microsoft's latest LLM-powered tools detect vulnerabilities, hallucinations within AI apps

By Akash Pandey

Mar 29, 2024

10:14 am

What's the story

Microsoft's Chief Product Officer of Responsible AI, Sarah Bird, has announced the launch of several safety features for Azure users.

In an interview with The Verge, Bird states that these innovative features are designed to identify potential risks, monitor for unsupported "hallucinations," and prevent harmful prompts in real-time.

The new measures cater to Azure customers who may not have dedicated red teamers to test their AI services.

Risk mitigation

Safety features aim to prevent controversial AI responses

Bird explained that the evaluation system generates prompts mimicking potential attacks such as prompt injection or offensive content.

This process allows customers to receive a score and view the results, helping them avoid controversies caused by undesirable or unintended responses from generative AI.

The move comes in response to recent controversies involving explicit celebrity fakes by Microsoft's Designer image generator, historically inaccurate images by Google Gemini, and inappropriate content on Bing.

Feature breakdown

Microsoft unveils three key safety features for Azure AI

The new safety features introduced by Microsoft include:

Prompt Shields: Designed to prevent harmful prompts or prompt injections from external documents that could cause models to deviate from their training.

Groundedness Detection: Aims to identify and prevent hallucinations.

Safety evaluations: Used to examine model vulnerabilities.

These three features are currently available in preview on Azure AI.

Microsoft plans to introduce two more features soon, that will guide models toward safe outputs and monitor prompts to identify potentially problematic users.

Information

How does the monitoring system work?

The monitoring system checks if a user's input or third-party data triggers any banned words or hidden prompts before it is sent to the model for a response. It then verifies if the model has hallucinated information not present in the document or prompt.

User control

Microsoft allows Azure customers to control AI model filtering

Addressing concerns about companies determining what is suitable for AI models, Bird stated that a feature has been incorporated allowing Azure customers to control the filtering of hate speech or violence identified by the model.

This feature empowers users to decide what content the model prevents, offering them more control over their AI services.

Future Azure users will also have access to reports on users who attempt to generate unsafe outputs, aiding system administrators in identifying potentially harmful users.