Two new papers discuss how Alexa recognizes sounds

Last year, Amazon announced the beta release of Alexa Guard, a new service that lets customers who are leaving the house instruct their Echo devices to listen for glass breaking or smoke and carbon dioxide alarms going off. At this year’s International Conference on Acoustics, Speech, and Signal Processing, our team is presenting several papers on sound detection. I wrote about one of them a few weeks ago, a new method for doing machine learning with unbalanced data sets.Read More

New speech recognition experiments demonstrate how machine learning can scale

Customer interactions with Alexa are constantly growing more complex, and on the Alexa science team, we strive to stay ahead of the curve by continuously improving Alexa’s speech recognition system. Increasingly, keeping pace with Alexa’s expanding capabilities will require automating the learning process, through techniques such as semi-supervised learning, which leverages a small amount of annotated data to extract information from a much larger store of unannotated data.Read More

Joint training on speech signal isolation and speech recognition improves performance

The idea of using arrays of microphones to improve automatic speech recognition (ASR) is decades old. The acoustic signal generated by a sound source reaches multiple microphones with different time delays. This information can be used to create virtual directivity, emphasizing a sound arriving from a direction of interest and diminishing signals coming from other directions. In voice recognition, one of the more popular methods for doing this is known as “beamforming”.Read More