
The AI magic behind Modi-Lex podcast's speedy translations, revealed
What's the story
The latest podcast of Prime Minister Narendra Modi with American computer scientist and presenter Lex Fridman has taken the world by storm.
The most interesting part of this podcast is its quick translation into several languages, a job that usually takes weeks, but was done in just a few hours.
This was made possible by ElevenLabs, a London-based AI voice synthesis company.
Advanced tech
ElevenLabs' AI model: More than just a translator
Siddharth Srinivasan, ElevenLabs' India head, clarified that their AI model is more than a translator as it incorporates emotional understanding.
"The beauty of our model is that it is natively multilingual and understands context. It has emotion built into it," he said.
This sophisticated tech enables the AI to understand conversations, add pauses, and change tones slightly for authentic translations.
Tech secret
Proprietary audio models drive ElevenLabs' rapid translations
The rapid translation of the Modi-Fridman podcast was done using ElevenLabs' proprietary audio models.
These highly-trained multilingual AI models understand different languages, accents, and contexts.
"You literally have infinite voice possibilities," Srinivasan explained. "Because of the core part of the technology, what you're able to do is have these experiences so that it doesn't seem like it's a translator but it comes across authentically in that person's voice."
Editorial process
Human oversight complements ElevenLabs' AI technology
Despite the advanced capabilities of their AI models, ElevenLabs stresses the importance of human oversight in their dubbing process.
"We run this with a human-in-the-loop process where technology enables things to happen... and then you have a really meticulous, strong editorial process," Srinivasan explained.
This combination ensures that the dubbed version sounds authentic and maintains high-quality standards.
Innovative strategy
ElevenLabs' unique approach to voice synthesis
Srinivasan clarified that ElevenLabs' approach to voice synthesis is different from others in the industry.
"Our models are actually audio models that are our own. So that's our IP," he said.
The company has created specialized audio models for speech synthesis and voice cloning, instead of relying solely on text-based models like large language models (LLMs).