Samsung's Galaxy AI now speaks Hindi like a native
Samsung's research and development center in Bengaluru, India, has successfully developed a Hindi large language model for its Galaxy AI platform. The Samsung R&D Institute India-Bengaluru (SRI-B), the company's largest research and development center outside Korea, announced this achievement. In addition to Hindi, the institute has also enhanced the technology for several other languages including Thai, Vietnamese, and Indonesian.
Complex task of developing Hindi AI model unveiled
The creation of the Hindi AI model was a complex task, involving the coverage of over 20 regional dialects, tonal inflections, punctuation and colloquialisms. Samsung stated that building the Hindi AI model was not simple. The process also required addressing the common practice among Hindi speakers of mixing English words into their conversations. Multiple rounds of AI model training using translated and transliterated data were necessary to achieve this feat.
Unique phonetic challenges in Hindi AI model development
"Hindi has a complex phonetic structure that includes retroflex sounds - sounds made by curling the tongue back in the mouth - which are not present in many other languages," explained Giridhar Jakki, SRI-B Head of Language AI. To build the speech synthesis element, native linguists were consulted to understand all unique sounds and create a special set of phenomes to support specific dialects. The Vellore Institute of Technology provided almost a million lines of segmented audio data.
Galaxy AI expands language support beyond Hindi
In addition to its work on Hindi, SRI-B collaborated with global teams to develop AI language models for British, Indian and Australian English as well as Thai, Vietnamese and Indonesian. This expansion has allowed the Galaxy AI platform to now support 16 languages. The enhanced language capabilities enable more users worldwide to expand their language skills even when offline, marking a significant milestone in Samsung's commitment to linguistic diversity.