Summarize

ChatGPT's emotionally aware and human-like voice mode is now available

By Mudit Dube

Jul 31, 2024

01:06 pm

What's the story

OpenAI has introduced an innovative feature, the Advanced Voice Mode, to its ChatGPT Plus users.

This new function utilizes GPT-4o's capabilities to produce hyperrealistic audio responses.

Currently in its alpha stage, the feature is accessible to a select group of users with plans for a wider rollout later this year.

The voice feature of GPT-4o was first showcased by OpenAI in May and received widespread attention for its quick responses and human-like quality.

Feature details

ChatGPT's advanced voice mode: A leap in AI communication

The Advanced Voice Mode is a significant improvement over the existing Voice Mode in ChatGPT.

Previously, three separate models were required for voice-to-text conversion, prompt processing and text-to-voice conversion.

However, GPT-4o is a multimodal AI model and can perform these tasks without auxiliary models. This leads to significantly lower latency conversations.

OpenAI also claims that GPT-4o can detect emotional intonations in voices such as sadness, excitement or singing.

Rollout strategy

OpenAI's gradual rollout and safety measures for advanced voice mode

OpenAI plans a gradual release of the Advanced Voice Mode to closely monitor its usage.

Users selected for the alpha group will receive an alert in the ChatGPT app and an email with details on how to use it.

The launch was postponed until June by OpenAI to improve its safety measures.

The video and screensharing capabilities previewed during OpenAI's Spring Update will not be part of this alpha release but are set to launch at a later date.

Voice options

OpenAI's advanced voice mode to offer four preset voices

The Advanced Voice Mode will offer four preset voices - Juniper, Breeze, Cove, and Ember - created in collaboration with paid voice actors.

The Sky voice from OpenAI's May demo will no longer be available in ChatGPT. The voice, eerily reminiscent of Scarlett Johansson's iconic role in Her, sparked immediate controversy.

OpenAI spokesperson Lindsay McCallum clarified that "ChatGPT cannot impersonate other people's voices, both individuals and public figures, and will block outputs that differ from one of these preset voices."

OpenAI introduces filters to prevent copyright infringement

To avoid controversies similar to those faced by AI startup ElevenLabs earlier this year, OpenAI has implemented new filters.

In January, ElevenLabs's voice cloning technology was utilized to impersonate President Joe Biden.

OpenAI's filters in the advanced voice mode are designed to block requests to generate music or other copyrighted audio.

This move comes as AI companies face increasing legal challenges over copyright infringement, particularly from record labels who have previously sued AI song-generators Suno and Udio.