Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning

Authors: Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Cheng Wan, Yonggan Fu, Yongan Zhang, Yingyan Celine Lin

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments and visualizations demonstrate that Master ASR can effectively discover language similarity and improve multilingual and low-resource ASR performance over state-of-the-art (SOTA) methods, e.g., under multilingual-ASR, our framework achieves a 0.13 2.41 lower character error rate (CER) with 30% smaller inference overhead over SOTA solutions on multilingual ASR and a comparable CER, with nearly 50 times fewer trainable parameters over SOTA solutions on low-resource tuning, respectively.
Researcher Affiliation Collaboration 1School of Computer Science, Georgia Institute of Technology, Atlanta, USA 2MIT-IBM Watson AI Lab, Boston, USA.
Pseudocode No The paper includes figures illustrating the model architecture and training process, but no formal pseudocode blocks or algorithms.
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide any links to a code repository.
Open Datasets Yes We evaluate Master-ASR using a subset of the widely used large-scale Common Voice dataset (Ardila et al., 2019). Specifically, this subset comprises 51 languages, each of which contains one hour of training data and one hour of validation data, to train our multilingual ASR model as described in Sec. 3.5.
Dataset Splits Yes Specifically, this subset comprises 51 languages, each of which contains one hour of training data and one hour of validation data, to train our multilingual ASR model as described in Sec. 3.5. Furthermore, we collect an additional dataset consisting of six languages, with 10 minutes of training data and 10 minutes of validation data for each language, to evaluate the performance of low-resource tuning as discussed in Sec.3.6.
Hardware Specification No Specifically, we train models for 100k iterations on 36 GPUs using an Adam optimizer with an initial learning rate of 5e-5 and a tri-stage schedule for all modules except T . While the number of GPUs is specified, the specific model or type of GPU is not mentioned.
Software Dependencies No The paper mentions using an “Adam optimizer” and a “pretrained XLSR-53 model” but does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers.
Experiment Setup Yes Specifically, we train models for 100k iterations on 36 GPUs using an Adam optimizer with an initial learning rate of 5e-5 and a tri-stage schedule for all modules except T . Unless stated otherwise, we set t = 0.3, α = 10, β = 5, and γ = 5, 000.