OpenAI announces DALL-E 3 API, Audio API, and Whisper large-v3

By Akash Pandey

Nov 07, 2023

02:04 am

What's the story

OpenAI has announced several new APIs, including DALL-E 3 (a text-to-image model), and a new text-to-speech API.

DALL-E 3, now available via an API, allows developers to integrate it into their own applications.

Meanwhile, the new text-to-speech API, known as Audio API, offers a variety of pricing options.

Whisper large-v3, an open-source automatic speech recognition model accessible on GitHub, has also made its debut.

Details

DALL-E 3 offers variety of formats, quality, and resolution options

DALL-E 3 will initially be available in ChatGPT and Bing Chat. Similar to its predecessor, DALL-E 2, the API is equipped with built-in moderation features to prevent misuse.

The DALL-E 3 API provides a variety of format, quality, and resolution options, spanning from 1024×1024 to 1792×1024. The pricing starts at $0.04 per generated image.

Currently, it has some limitations compared to its predecessor.

The API can't be used to create edited image versions

In contrast to the DALL-E 2 API, the DALL-E 3 API doesn't support the creation of modified image versions by replacing sections of an existing image or generating variations.

Additionally, when submitting a generation request to DALL-E 3, OpenAI states that it'll automatically revise the request "for safety reasons" and to "enhance detail." This adjustment may result in less precise outcomes.

Audio

Audio API is a text-to-speech solution

OpenAI has also introduced the Audio API, a text-to-speech solution, offering a selection of six predefined voices: Alloy, Echo, Fable, Onyx, Nova, and Shimmer.

This service is available for immediate use, and the pricing begins at $0.015 per every 1,000 characters of input.

Unlike certain speech synthesis platforms and tools, OpenAI doesn't offer a method to regulate the emotional tone of the generated audio.

Information

OpenAI also brings an automatic speech recognition model

On a related update, OpenAI introduced the latest iteration of its open-source automatic speech recognition model, known as Whisper large-v3. OpenAI asserts that this version delivers enhanced performance across various languages. Whisper large-v3 is now accessible on GitHub, provided under a permissive license.

GPTs

OpenAI allowing individuals to create custom versions of ChatGPT

OpenAI will now allow individuals to create their personalized chatbots or GPTs, even if they have no coding expertise.

The company envisions that people might want to develop custom chatbots to address particular challenges or interests.

To generate GPTs, users simply convey their bot's purpose to ChatGPT. Behind the scenes, ChatGPT automatically generates and executes the required code.

Scenario

ChatGPT remains among the most rapidly expanding services to date

ChatGPT is renowned as the swiftest-growing consumer app ever.

At the developer conference, OpenAI CEO Sam Altman announced that ChatGPT reached 100 million monthly users within just two months of its launch, while also attracting over two million developers to build on its API.

This impressive growth surpassed the early user adoption rates of platforms like Facebook, Twitter, and Instagram.