Streaming On-Device Detection of Device Directed Speech from Voice and Touch-Based Invocation

When interacting with smart devices such as mobile- phones or wearables, the user typically invokes a virtual assistant (VA) by saying a keyword or by pressing a but- ton on the device. However, in many cases, the VA can accidentally be invoked by the keyword-like speech or ac- cidental button press, which may have implications on user experience and privacy. To this end, we propose an acous- tic false-trigger-mitigation (FTM) approach for on-device device-directed speech detection that simultaneously handles the voice-trigger and touch-based invocation. To facilitate the model deployment…Apple Machine Learning Research