Should Alexa read “2/3” as “two-thirds” or “February Third”?: The science of text normalization

Text normalization is an important process in conversational AI. If an Alexa customer says, “book me a table at 5:00 p.m.”, the automatic speech recognizer will transcribe the time as “five p m”. Before a skill can handle this request, “five p m” will need to be converted to “5:00PM”. Once Alexa has processed the request, it needs to synthesize the response — say, “Is 6:30 p.m. okay?” Here, 6:30PM will be converted to “six thirty p m” for the text-to-speech synthesizer. We call the process of converting “5:00PM” to “five p m” text normalization and its counterpart — converting “five p m” to “5:00PM” — inverse text normalization.Read More

Using wake word acoustics to filter out background speech improves speech recognition by 15%

One of the ways that we’re always trying to improve Alexa’s performance is by teaching her to ignore speech that isn’t intended for her. At this year’s International Conference on Acoustics, Speech, and Signal Processing, my colleagues and I will present a new technique for doing this, which could complement the techniques that Alexa already uses.Read More

Two new papers discuss how Alexa recognizes sounds

Last year, Amazon announced the beta release of Alexa Guard, a new service that lets customers who are leaving the house instruct their Echo devices to listen for glass breaking or smoke and carbon dioxide alarms going off. At this year’s International Conference on Acoustics, Speech, and Signal Processing, our team is presenting several papers on sound detection. I wrote about one of them a few weeks ago, a new method for doing machine learning with unbalanced data sets.Read More