Summarize

Open-source AI must disclose data used for training, says OSI

By Akash Pandey

Oct 29, 2024

11:38 am

What's the story

The Open Source Initiative (OSI) has released an official definition for "open" artificial intelligence (AI), possibly igniting a war with tech giants like Meta. According to the new definition, an AI system, to be truly open source, must offer access to the data it was trained on. This encompasses the entire code used to create and run the AI, as well as training settings that help produce results.

Compliance issues

Meta's AI model falls short of new OSI standards

Meta's Llama, which is being marketed as the largest open-source AI model, also fails to meet OSI's newly established standards. Although the model is publicly available for download and use, Llama restricts commercial use for apps with over 700 million users. It also doesn't provide access to its training data. This lack of transparency keeps it from meeting OSI's criteria for unrestricted freedom to modify, use, and share.

Industry reaction

Response to OSI's new definition

In response to the new definition, Meta spokesperson Faith Eischen told The Verge that while they often agree with OSI, they don't concur with this particular interpretation. Eischen emphasized the complexity of defining open source AI due to the intricacies of today's rapidly evolving AI models. She assured that Meta would continue collaborating with OSI and other industry groups, to responsibly increase AI accessibility, regardless of technical definitions.

Impact

OSI's definition hailed as crucial in AI openness debate

The Linux Foundation also recently tried to define "open-source AI," highlighting a growing debate over how traditional open-source values will evolve in the AI era. Simon Willison, an independent researcher and creator of the open-source multi-tool Datasette, thinks OSI's strong definition could help combat companies that falsely claim their work is open source. Hugging Face CEO Clement Delangue has lauded OSI's definition as instrumental in shaping discussions around openness in AI, especially about the critical role of training data.

Data debate

Challenging Meta's stance on training data

Meta has cited safety concerns as the reason behind limiting access to its training data. However, critics argue that the company is mainly trying to limit its legal liability and protect its competitive edge. OSI's Executive Director Stefano Maffulli compared Meta's stance to Microsoft's resistance toward open source in the 1990s. He suggested that like Microsoft, Meta is using cost and complexity as reasons for keeping its technology proprietary.