Bilingual End-to-End ASR with Byte-Level Subwords

In this paper, we investigate how the output representation of an end-to-end neural network affects multilingual automatic speech recognition (ASR). We study different representations including character-level, byte-level, byte pair encoding (BPE), and byte- level byte pair encoding (BBPE) representations, and analyze their strengths and weaknesses. We focus on developing a single end-to- end model to support utterance-based bilingual ASR, where speakers do not alternate between two languages in a single utterance but may change languages across utterances. We conduct our experiments on…Apple Machine Learning Research