Summarize

NVIDIA faces lawsuit for using copyrighted books for training AI

By Pratyaksh Srivastava

Mar 11, 2024

10:12 am

What's the story

NVIDIA is dealing with a lawsuit from three authors who claim that the chip-manufacturing giant used their copyrighted books to train its NeMo AI platform. Brian Keene, Abdi Nazemian, and Stewart O'Nan say their works were part of a dataset of nearly 196,640 books that trained NeMo to mimic everyday written language. The dataset was removed in October last year due to alleged infringement. The authors have demanded compensation for the wrongdoing.

Compensation sought

Authors seek damages for infringement

A class action lawsuit has ben filed in a San Francisco court. The authors believe that NVIDIA's removal of dataset shows the company's 'admitted' infringement of their copyrights by using their works to train NeMo. They're seeking unspecified damages for individuals whose copyrighted works were used to train NeMo's large language models in the past three years. The lawsuit covers literary works such as Keene's Ghost Walk; Nazemian's Like a Love Story; and O'Nan's Last Night at the Lobster.

Larger issue

Growing litigation over generative AI

This legal battle puts NVIDIA in the growing list of lawsuits involving writers and companies like The New York Times over generative AI, which produces new content based on inputs like text, images, and sounds. NVIDIA markets NeMo as a quick and affordable way to adopt generative AI. Other companies sued over this technology include OpenAI, the creator of the AI platform ChatGPT, and its partner Microsoft.

Tapping online traffic

NYT claims diversion of web traffic by AI bots

The New York Times argues that not only ChatGPT and Microsoft's Copilot infringe on its work but also compete for online readership of its content. The publication has complained that revenues from online readership are a vital source of revenue which is utilized for practising journalism. NYT has also cited cases when misinformation was wrongly attributed to it in the past.

Legal angle

The legal loophole for generative AI

The AI industry has been arguing that usage of digital content available on free internet is legally sanctioned by the doctrine of US copyright laws. The laws allow limited usage of copyright content for activities like research or teaching. However, publications are of the opinion that the usage of copyrighted content for training AI bots is not a fair practise as it endangers their sources of revenue and long term sustainability.