Mistral unveils AI-powered tool to tackle online toxicity
Mistral, an artificial intelligence (AI) start-up, has launched a new application programming interface (API) for content moderation. This is the same API that powers moderation on Le Chat, Mistral's chatbot platform. The company says it can be customized for specific applications and safety standards. The new API from Mistral is powered by a refined model called Ministral 8B, which has been trained to process text in several languages including English, French, and German.
It can classify text into 9 categories
The Mistral API can classify text into nine different categories. These include sexual content, hate and discrimination, violence and threats, dangerous and criminal content, self-harm, health-related issues, financial matters, legal topics, and personally identifiable information (PII).
Versatility and industry reception
The best part is that Mistral's moderation API can be applied to both raw and conversational text, showcasing its versatility. In its blog post, the company notes that "over the past few months, we've seen growing enthusiasm across the industry and research community for new AI-based moderation systems, which can help make moderation more scalable and robust across applications." "Our content moderation classifier leverages the most relevant policy categories for effective guardrails," Mistral adds.
Addressing biases and technical flaws in AI systems
Despite their potential benefits, AI-powered moderation systems can also be susceptible to the same biases and technical flaws that plague other AI systems. For example, some models trained to detect toxicity may read phrases in African-American Vernacular English (AAVE) as disproportionately "toxic." Social media posts about people with disabilities are frequently flagged as more negative or toxic by commonly used public sentiment and toxicity detection models.
Commitment to improving its moderation model
Mistral notes that its moderation model is extremely accurate but still a work in progress. The company has not compared the performance of its API with other popular moderation APIs such as Jigsaw's Perspective API and OpenAI's moderation API. "We're working with our customers to build and share scalable, lightweight, and customizable moderation tooling," the company said, promising to continue working with the research community for safety advancements in the wider field.