Meta's AI translates an oral language for the first time
Languages without a written script are the final frontier of machine learning-based language translation systems. It seems that Meta has found a way to conquer this. The company has developed an AI-based speech-to-speech translator that can help with languages that don't have a script. This is part of Meta's Universal Speech Translator (UST) program. Meta has open-sourced this new development.
Why does this story matter?
Nearly half of the world's languages don't have a script. If you're a fan of Star Wars and have dreamt of owning a Personal Universal Translator (PUT), primarily oral languages are kind of deal-breakers. Meta's speech-to-speech translator could be the solution to this. Although the company has a wonky past when it comes to AI-based language systems, this might be a game changer.
Speech-to-text translations require languages with script
AI-based speech translation systems rely on the availability of extensive transcriptions. This is where primarily oral languages become a problem for these systems. Since they don't have a script, producing text as the translated output doesn't work. This is why Meta used speech-to-speech translation. They picked Hokkien, a primarily oral language spoken by the Chinese diaspora, to develop the system.
How does Meta's speech-to-speech system work?
To make the speech-to-speech translation of Hokkien work, Meta first translated the input language into a sequence of acoustic sounds using speech-to-unit (S2U) translation. They then generated waveforms from the unit. Meta used Mandarin as an intermediate language between English and Hokkien to train the AI. They first translated Hokkien or English speech to Mandarin text and then translated it into Hokkien or English.
It can be used to translate one sentence at a time
Meta's new speech-to-speech translation system is far from perfect. It can only translate one sentence at a time as of now. Mark Zuckerberg, the company's CEO, believes that the system can be used for translating more languages in real-time.
Meta is open-sourcing the translating system
Meta is open-sourcing the Hokkien translation models, evaluation, and datasets. It is being done so that other researchers can build on this. It is introducing a speech-to-speech benchmarking system called 'Taiwanese Across Taiwan' based on the Hokkien speech corpus. The company is also releasing SpeechMatrix, a collection of speech-to-speech translations developed through LASER, Meta's language processing toolkit.