Meta is being sued over Llama AI models: Here's why

By Akash Pandey

Jan 11, 2025

04:19 pm

What's the story

Mark Zuckerberg, the CEO of Meta Platforms, has been accused of approving the use of copyrighted material to train the company's Llama artificial intelligence (AI) models. The allegations are part of a copyright infringement lawsuit, Kadrey v. Meta, currently being heard in the US District Court for the Northern District of California. The plaintiffs, including authors Sarah Silverman and Ta-Nehisi Coates, allege Zuckerberg approved Meta's use of a dataset called LibGen for Llama-related training.

Dataset details

LibGen: A controversial dataset

LibGen, a self-proclaimed "links aggregator," offers access to copyrighted works from publishers like Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education. The platform has been sued multiple times and ordered to shut down on several occasions for copyright infringement. Despite the legal issues and concerns within Meta's AI executive team over the regulatory implications of using LibGen, Zuckerberg reportedly approved its use for training at least one of Meta's Llama models.

Infringement concealment

Meta's alleged attempts to hide copyright infringement

The lawsuit also claims that Meta tried to conceal its copyright infringement by stripping attribution from the LibGen data. The plaintiffs' counsel, Nikolay Bashlykov, said that a Meta engineer working on the Llama research team created a script to delete copyright information from e-books on LibGen. The company is also accused of stripping copyright markers from science journal articles and "source metadata" in Llama's training data.

Data acquisition

Controversial data acquisition methods

The lawsuit alleges that Meta admitted to torrenting LibGen during depositions, a file distribution method that requires users to upload the files they download. This is viewed by some as another form of copyright infringement. The plaintiffs' counsel alleges that Meta attempted to conceal its activity by restricting the number of files it uploaded, with Ahmad Al-Dahle, head of generative AI at Meta, reportedly brushing off concerns about the legality of this method.