ChatGPT unveils voice, image prompt features: Here's how they work

By Dwaipayan Roy

Sep 25, 2023

07:10 pm

What's the story

OpenAI has unveiled a new version of ChatGPT, featuring voice and image commands, revolutionizing the way users interact with AI. The update will be available to paying customers in the next two weeks, followed by a wider release. The voice chat feature converts spoken questions into text, while the text-to-speech feature uses OpenAI's Whisper model to generate "human-like audio from just text and a few seconds of sample speech."

Details

Voice commands enhance user experience

The new version of ChatGPT allows users to prompt the AI bot using voice commands, making it more natural and user-friendly. OpenAI's text-to-speech model generates "human-like audio from just text and a few seconds of sample speech," potentially transforming the way we consume audio content. OpenAI is also collaborating with Spotify in order to translate podcasts into other languages while retaining the podcaster's original voice.

Problems

Addressing synthetic voice security concerns

While synthetic voices offer numerous benefits, they also pose risks, such as impersonation and fraud. OpenAI's text-to-speech model could be misused by malicious actors. To mitigate these risks, OpenAI is limiting the model's use to specific cases and partnerships.

Insights

ChatGPT's image search takes on Google Lens

Similar to Google Lens, ChatGPT's image search feature allows users to snap a photo, and the AI bot will attempt to identify and respond accordingly. Users can also utilize the app's drawing tool for clearer queries or speak/type questions alongside the image. However, the feature has limitations when analyzing and making direct statements about people, due to accuracy and privacy concerns.