Google AI Can Identify Voices Just Looking at People's Faces

Google AI Can Identify Voices Just Looking at People’s Faces

Researchers explain how a new deep learning system (google AI) is able to identify voices just looking at people’s faces as they speak. It was released in a paper titled, “Looking to Listen at the Cocktail Party”. “People are remarkably good at focusing their attention on a particular person in a noisy environment, mentally “muting” all other voices and sounds,” Inbar Mosseri and Oran Lang, software engineers at Google Research noted in a blog post. And while this ability is innate to human beings, “automatic speech separation — separating an audio signal into its individual speech sources — while a well-studied problem, remains a significant challenge for computers.” Mosseri and Lang, however, have created a deep learning audio-visual model capable of isolating speech signals from a variety of other auditory inputs, like additional voices and background noise. “We believe this capability can have a wide range of applications, from speech enhancement and recognition in videos, through video conferencing, to improved hearing aids, especially in situations where there are multiple people speaking,” the duo said. The proves you can see below on the video. Even when people are clearly trying to compete with each other (such as comedians Jon Dore and Rory Scovel in the Team Coco clip above), the google AI can generate a clean audio track for one person just by focusing on their face. That’s true even if the person partially obscures their face with hand gestures or a microphone. Engadget claimed“Google is currently “exploring opportunities” to use this feature in its products, but there are more than a few prime candidates. It’s potentially ideal for video chat services like Hangouts or Duo, where it could help you understand someone talking in a crowded room. It could also be helpful for speech enhancement in video recording. And there are big implications for accessibility: it could lead to camera-linked hearing aids that boost the sound of whoever’s in front of you, and more effective closed captioning.”