Here's how tech giants handle data captured by voice assistants
From phones to speakers, everything is turning smart thanks to the power of voice recognition and artificial intelligence. Most tech giants, including Google and Apple, offer AI-powered voice assistants to help users automate day-to-day activities. But, here's the thing, the same companies also fetch our voice recordings (what we say to these assistants) to improve their products. Let's see how they handle this data.
Amazon annotates recordings, gets them reviewed
Speaking to VentureBeat, an Amazon spokesperson confirmed they annotate a small number of recordings to improve Alexa's speech recognition and make the assistant better at understanding/answering queries. The company gets the recordings reviewed by contractors for feature development but emphasizes that all of that happens in 'high confidentiality' with strict safeguards in place and without revealing personally-identifying information (only first name, device serial number).
Apple also gets recordings reviewed
Apple also gets a small number of Siri recordings reviewed by graders. The data is encrypted and anonymized with random identifiers, which allows the graders to rate the quality of Siri's response and mark them with labels contributing to the assistant's development. The labeled recordings are fed into Apple's recognition system, which ultimately ensures quality assurance and helps Siri better understand the user's voice.
It may use recordings for years
Apple says "voice recordings are saved for a six-month period so that the recognition system can utilize them to better understand user's voice." However, it notes that a small subset of this data, without identifiers, may be used for improving developing Siri beyond six months.
How Google treats Assistant data
A Google spokesperson told VentureBeat it employs "a wide range of techniques to protect user privacy" while transcribing Google Assistant data for training its speech recognition system. The representative emphasized that the transcription process is automated and the analyzed audio data isn't associated with any personal information. And, if the company chooses to use a third-party reviewer, only text data is shared.
Samsung, Microsoft's data handling process is not clear
As Microsoft's privacy page notes, the Redmond giant collects voice interactions with Cortana to enhance the voice assistant's ability to recognize individual speech patterns and respond. However, the page doesn't detail how the company protects/anonymizes this data and if it is labeled by employees or third-party reviewers for development purposes. Same is the case for Samsung which collects data from its Bixby voice assistant.
Samsung's data goes to a third party service
Notably, some of the voice data collected by Samsung may go to an unspecified third party service that provides speech recognition service. It is not exactly clear how the service in question stores and handles this data.