NVIDIA's new AI model generates unique audio from text prompts
NVIDIA has unveiled an innovative generative artificial intelligence (AI) model, named Foundational Generative Audio Transformer Opus 1 or Fugatto. This experimental AI can generate and modify audio content based on text prompts given by users. The development team consisted of global AI researchers, which NVIDIA credits for enhancing the model's multi-accent and multilingual capabilities.
A human-like sound generator
Rafael Valle, a key researcher behind Fugatto and NVIDIA's Manager of Applied Audio Research, said their goal was to create an AI model capable of understanding and generating sounds like humans. The company has detailed a few potential real-world applications of this tech. These include its use by music producers to quickly generate song prototypes, and by people for making language learning resources in their favorite voice.
Potential in gaming and beyond
Fugatto's capabilities also extend to the gaming industry, where it can be used to create variations of pre-recorded assets to match changes in gameplay according to player choices. The model also showed the ability to perform tasks beyond its initial training with some fine-tuning. For example, it can generate speech that sounds angry with a specific accent or create the sound of birds singing during a thunderstorm.
Fugatto's unique sound generation abilities
Fugatto can also simulate sounds that evolve over time, such as the progression of a rainstorm across a landscape. However, NVIDIA has not yet disclosed if it will make Fugatto accessible to the public. This innovative AI model joins other similar technologies from Meta and Google that can create sounds from text prompts.