OpenAI suspends TikTok parent ByteDance's account: Here's why
ByteDance, TikTok's parent company, is in hot water for allegedly using GPT-generated data through Microsoft Azure to train its own AI model in China, violating the developer license of both OpenAI and Microsoft. This has led OpenAI to hit the brakes on ByteDance's account while it digs deeper into the issue. Microsoft, on the other hand, has not revealed if it will do the same and cut off ByteDance's access to its Azure platform.
OpenAI's move follows recent report
OpenAI's action to suspend ByteDance's account came after The Verge reported that the Chinese tech firm was using GPT to train its AI model. Niko Felix, a spokesperson from OpenAI, confirmed the suspension, saying, "All API customers must adhere to our usage policies... While ByteDance's use of our API was minimal, we have suspended their account while we further investigate." If ByteDance is found guilty, it will need to make changes or risk losing its account for good, per Felix.
ByteDance used OpenAI API to develop its foundational LLM
As reported by Alex Heath from The Verge, an internal ByteDance document shared with him confirmed the extensive use of the OpenAI API in the development of ByteDence's foundational large language model (LLM), codenamed Project Seed. The reliance on OpenAI's technology occurred across various stages, encompassing training and model evaluation. ByteDance employees engaged in the process were reportedly aware of the potential repercussions and strategies to "whitewash" evidence through "data desensitization."
OpenAI's platform used in early days of development
ByteDance's utilization of OpenAI's platform was more evident in Project Seed's initial stages. Later, the company directed the team to cease using GPT-generated text in "any stage of model development." During all this, ByteDance received regulatory approval in China to launch Project Seed through the chatbot platform "Doubao." However, the API is still employed in ways that contravene OpenAI's and Microsoft's terms of service. This includes assessing the performance of ByteDance's model within Doubao, an insider told The Verge.
'We use our self-developed model to power Doubao'
Responding to The Verge's story, ByteDance spokesperson Jodi Seth initially stated that GPT-generated data was employed in "annotating" Project Seed during the early stages of its development. She said it was removed from ByteDance's training data around mid-2023. Seth claimed, "ByteDance is licensed by Microsoft to use the GPT APIs," adding that GPT is utilized to drive products and features in non-China markets. However, ByteDance relies on its self-developed model to support China-exclusive Doubao, she said.
Use of GPT to create competing products prohibited
TikTok's captivating "For You" feed propelled ByteDance on the global stage. Despite its AI leadership in previous years, it has fallen behind in the generative AI race. So much so that it discreetly leveraged OpenAI's technology to create its own rival LLM. This approach is widely frowned upon in the AI community and goes against OpenAI's terms of service, which explicitly prohibits using its model output "to develop any artificial intelligence models that compete with our products and services."
ByteDance employed GPT via Microsoft Azure
Interestingly, most of ByteDance's GPT usage happened through Microsoft's Azure platform, not directly with OpenAI. This raises questions about whether Microsoft will follow OpenAI's example and suspend ByteDance's access to their services. So far, Microsoft has kept quiet and hasn't taken any action in response to the allegations against ByteDance. The company's response is still awaited.