Summarize

OpenAI's ChatGPT can now 'see' and hear you in real-time

By Mudit Dube

Dec 13, 2024

09:38 am

What's the story

As part of its "12 Days of OpenAI" event, OpenAI has unveiled a new vision capability for ChatGPT's advanced voice feature.

The AI-powered chatbot can now identify objects displayed via a smartphone camera or device screen, and respond using its Advanced Voice Mode.

The functionality was first hinted at with the introduction of the GPT-4o model in May.

User access

Feature rollout and accessibility

The video and screensharing feature will soon be available to most ChatGPT Plus, Pro, and all Team users through the ChatGPT mobile app.

Users in the European Union, Switzerland, Iceland, Norway, and Liechtenstein are also expected to gain access soon.

Enterprise and Edu users will have to wait until January for these features.

Technical details

Advanced Voice Mode powered by multimodal 4o model

The Advanced Voice feature is powered by OpenAI's multimodal 4o model, allowing it to handle audio input in a natural, conversational way.

A Santa voice preset has also been introduced in the Advanced Voice Mode, which can be activated from a snowflake icon in ChatGPT.

This festive offering will be available for all users on mobile, web, and desktop apps until early January.

Latest updates

OpenAI introduces new products and features

OpenAI has unveiled new products and features in the past week. These include the release of its o1 model from preview and a $200-per-month 'Pro' subscription tier for ChatGPT.

The company has also released Sora Turbo, its AI video generator and Canvas for ChatGPT to all users, which is a digital editing platform for the AI chatbot.

ChatGPT is also now accessible through Apple's Siri voice assistant, expanding its reach to a wider user base.