Summarize

OpenAI supercharges ChatGPT with emotions, natural voice, and incredible capabilities

By Dwaipayan Roy

May 13, 2024

11:41 pm

What's the story

OpenAI has unveiled its latest generative AI model, GPT-4o. The model is set to be gradually implemented across the company's developer and consumer-facing products, in the upcoming weeks. According to OpenAI CTO Muri Murati, GPT-4o offers "GPT-4-level" intelligence with enhanced capabilities in text, vision, and audio processing. "GPT-4o reasons across voice, text and vision," Murati stated during the OpenAI 'Spring Update' event.

Enhanced features

GPT-4o builds on predecessor's capabilities, adds speech

GPT-4o is an advancement of its predecessor, the GPT-4 Turbo, which was trained on a combination of images and text. While GPT-4 Turbo could analyze pictures and text for tasks such as extracting text from images or describing their content, GPT-4o introduces speech into the equation. This new feature allows users to interact with OpenAI's AI-powered chatbot, ChatGPT, in a more assistant-like manner.

User interaction

GPT-4o enhances user experience with ChatGPT

One of the key improvements brought by GPT-4o is in the ChatGPT experience. Users can now interrupt ChatGPT while it's answering and ask questions. The model delivers "real time" responsiveness and can even detect the emotion in a user's voice, generating voice in "a range of different emotive styles." Furthermore, GPT-4o enhances ChatGPT's vision capabilities, allowing it to quickly answer questions related to a given photo or desktop screen.

Convenience

GPT-4o now offered in free tier of ChatGPT

OpenAI has made GPT-4o available in the free tier of ChatGPT, starting today. Subscribers to OpenAI's premium services, ChatGPT Plus and Team, will enjoy "5x higher" message limits with Enterprise options "coming soon." The improved voice experience powered by GPT-4o will be rolled out in alpha to Plus users within the next month. Additionally, GPT-4o boasts improved multilingual capabilities with enhanced performance in 50 different languages.

Performance

GPT-4o's enhanced performance and limited voice API

In OpenAI's API, GPT-4o is twice as fast as its predecessor, GPT-4 Turbo, half the cost, and has higher rate limits. However, voice isn't a part of the GPT-4o API for all customers at present due to potential misuse risks. OpenAI plans to initially launch support for GPT-4o's new audio capabilities to "a small group of trusted partners" in the coming weeks. This approach ensures a secure rollout while maximizing the model's potential benefits.

Updates

Changed ChatGPT UI and new desktop app

OpenAI is introducing a tweaked ChatGPT UI on the web. It bears a new, "more conversational" home screen, as well as message layout. A desktop version of ChatGPT for macOS, which permits users to ask ChatGPT questions using a keyboard shortcut, and take/discuss screenshots by typing/speaking, has also been launched. Plus users will be able to access it first starting today, and a Windows app will debut later this year.

Availability

Free access to the GPT Store

Finally, access to the OpenAI's GPT Store is now offered to users of ChatGPT's free tier. It is a library of third-party chatbots built on the company's AI models. Do note, that free users will be able to take advantage of formerly paywalled features. This includes a memory capability that allows ChatGPT to "remember" preferences for interactions in the future.