TECHNOLOGY

Meta introduces Voicebox: Check details

By Akash Pandey

1

Details

Meta's Voicebox can generate high-quality sound clips and edit pre-recorded audio while preserving the style. It is a multilingual AI model, capable of producing speech in six different languages.

2

Scenario

Voicebox can either create outputs from scratch or modify a sample given to it. It can help with speech synthesizing, audio editing, noise removal, diverse sample generation, and style conversion.

3

Approach

Voicebox employs a novel approach to learning that relies solely on raw audio and transcription. It is based on a technique known as Flow Matching, which has been shown to outperform diffusion models.

4

Training

According to Meta, Voicebox is trained with 50,000+ hours of pre-recorded speech/transcripts from public-domain audiobooks in English, French, Spanish, German, Polish, and Portuguese.

5

Applications

Voicebox can use an audio sample, and replicate its style for text-to-speech generation. It can restore a section of speech interrupted by noise, or replace mispronounced words.

6

Availability

Despite having many intriguing applications, the Voicebox model or code isn't publicly available at the moment due to the potential risks of misuse.

For more Technology news click on the link below