OpenAI's GPT-4o copied user's voice during testing, raises safety concerns
OpenAI's latest artificial intelligence (AI) model, GPT-4o, has demonstrated an unexpected ability to mimic a user's voice. This surprising behavior was revealed in the model's "system card," a document detailing its limitations and safety measures. The incident occurred during the testing phase of the ChatGPT system's Advanced Voice Mode feature, which is designed to facilitate spoken interactions with the AI assistant.
Unintentional voice imitation: A rare occurrence
The system card of GPT-4o highlighted a section titled "Unauthorized voice generation," where OpenAI disclosed an incident of unintentional voice imitation. The company stated, "During testing, we also observed rare instances where the model would unintentionally generate an output emulating the user's voice." This unexpected behavior was triggered by noisy input, leading the AI model to respond in a manner similar to that of the tester.
Voice imitation: A result of its sound synthesis capability?
The system card suggests that GPT-4o's unexpected voice imitation could be due to its ability to synthesize almost any type of sound found in its training data. This includes not only voices but also sound effects and music. To safely manage this feature, OpenAI provides an authorized voice sample from a hired actor for the model to imitate at the start of each conversation.
OpenAI's measures to prevent unauthorized audio generation
For text-only Language Learning Models (LLMs), the instructions are silently added to the conversation history just before chat sessions begin. With GPT-4o's ability to process tokenized audio, OpenAI can use audio inputs as part of the model's system prompt. Currently, OpenAI has safeguards to prevent this issue, but the situation highlights the increasing complexity of designing AI chatbots capable of imitating any voice from a brief clip.