Revolutionary Silent-Speech Recognition Interface Developed
Cornell University researchers have developed a revolutionary new silent-speech recognition interface that could change the way we interact with technology. The interface, known as EchoSpeech, uses acoustic-sensing and artificial intelligence to recognize up to 31 unvocalized commands based on lip and mouth movements. The wearable device, which requires only a few minutes of user training data, can run on a smartphone, making it accessible to a wide range of people.
EchoSpeech has the potential to be a game-changer for individuals who cannot vocalize sound, as it could serve as an input for a voice synthesizer, enabling them to communicate with others. According to Ruidong Zhang, the lead author of the study, “this silent speech technology could be an excellent input for a voice synthesizer. It could give patients their voices back.”
Moreover, EchoSpeech could be used to communicate with others via smartphone in places where speech is inconvenient or inappropriate, like a noisy restaurant or quiet library. The silent speech interface can also be paired with a stylus and used with design software like CAD, eliminating the need for a keyboard and a mouse.
EchoSpeech is outfitted with a pair of microphones and speakers smaller than pencil erasers, becoming a wearable AI-powered sonar system that sends and receives soundwaves across the face and senses mouth movements. A deep learning algorithm then analyzes these echo profiles in real time, with about 95% accuracy.
Cheng Zhang, assistant professor of information science and director of Cornell’s Smart Computer Interfaces for Future Interactions (SciFi) Lab, said, “We’re moving sonar onto the body. We’re very excited about this system because it really pushes the field forward on performance and privacy. It’s small, low-power, and privacy-sensitive, which are all important features for deploying new, wearable technologies in the real world.”
Unlike other silent-speech recognition technology that requires the user to face or wear a camera, EchoSpeech eliminates the need for wearable video cameras, making it more practical and feasible. Additionally, because audio data is much smaller than image or video data, it requires less bandwidth to process and can be relayed to a smartphone via Bluetooth in real time. According to François Guimbretière, professor in information science, “the data is processed locally on your smartphone instead of uploaded to the cloud. Privacy-sensitive information never leaves your control.”
EchoSpeech is an exciting advancement in the field of silent-speech recognition technology, with the potential to revolutionize the way we interact with technology and improve the lives of individuals who cannot vocalize sound.