Meta's AI assistant trained on public Instagram, Facebook posts

By Sanjana Shankar

Sep 29, 2023

12:47 pm

What's the story

A top executive at Meta has revealed that the company's new Meta AI assistant was trained using public Facebook and Instagram posts while steering clear of private posts and chats to respect user privacy. Nick Clegg, Meta's President of Global Affairs, mentioned that the "vast majority" of the data used for training was already publicly accessible. Meta made sure to filter out private information from public datasets and avoided using websites with substantial personal information, like LinkedIn, he said.

What Next?

Meta's new AI assistant can generate text, images, audio

Introduced at Meta's recent annual 'Connect' conference, the Meta AI assistant marks a major advancement in the company's consumer-facing AI tools. The assistant can generate text, audio, and imagery. It is based on a custom model created using the powerful Llama 2 large language model and a new image model called Emu that creates images in response to text prompts. Plus, the AI assistance will have access to real-time information through Microsoft's Bing search engine.

Details

Meta, OpenAI are being criticized for using data without permission

Meta didn't use private chats from its social media apps for training the model, said Clegg. As tech giants like Meta, OpenAI, and Google face backlash for using internet-scraped data without permission to train their AI models, these companies are now rethinking how to handle private or copyrighted materials. Clegg expects legal disputes over the matter of "whether creative content is covered or not by existing fair use doctrine," which permits "limited use of protected works for purposes" like research.

Insights

Social media posts used for training contained both text, photos

The public Facebook and Instagram posts used to train Meta AI featured both text and photos. These posts helped train the image generation aspects of the product using the Emu model. Meanwhile, the chat functions relied on Llama 2 with some publicly available and annotated datasets added, a Meta spokesperson told Reuters. The spokesperson confirmed that interactions with Meta AI might be used to enhance the features in the future.

Facts

Meta's new service conditions prevent content generation that violates privacy

Meta has set safety restrictions on the content generated by the Meta AI tool, such as banning the creation of photo-realistic images of public figures. Regarding copyrighted materials, a Meta spokesperson pointed to new terms of service that prohibit users from generating content that violates privacy and intellectual property rights. Separately, Google has also launched a new tool that allows website publishers to opt out of having their data used for training the company's AI models.