Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning
Authors: Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Cheng Wan, Yonggan Fu, Yongan Zhang, Yingyan Celine Lin
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and visualizations demonstrate that Master ASR can effectively discover language similarity and improve multilingual and low-resource ASR performance over state-of-the-art (SOTA) methods, e.g., under multilingual-ASR, our framework achieves a 0.13 2.41 lower character error rate (CER) with 30% smaller inference overhead over SOTA solutions on multilingual ASR and a comparable CER, with nearly 50 times fewer trainable parameters over SOTA solutions on low-resource tuning, respectively. |
| Researcher Affiliation | Collaboration | 1School of Computer Science, Georgia Institute of Technology, Atlanta, USA 2MIT-IBM Watson AI Lab, Boston, USA. |
| Pseudocode | No | The paper includes figures illustrating the model architecture and training process, but no formal pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide any links to a code repository. |
| Open Datasets | Yes | We evaluate Master-ASR using a subset of the widely used large-scale Common Voice dataset (Ardila et al., 2019). Specifically, this subset comprises 51 languages, each of which contains one hour of training data and one hour of validation data, to train our multilingual ASR model as described in Sec. 3.5. |
| Dataset Splits | Yes | Specifically, this subset comprises 51 languages, each of which contains one hour of training data and one hour of validation data, to train our multilingual ASR model as described in Sec. 3.5. Furthermore, we collect an additional dataset consisting of six languages, with 10 minutes of training data and 10 minutes of validation data for each language, to evaluate the performance of low-resource tuning as discussed in Sec.3.6. |
| Hardware Specification | No | Specifically, we train models for 100k iterations on 36 GPUs using an Adam optimizer with an initial learning rate of 5e-5 and a tri-stage schedule for all modules except T . While the number of GPUs is specified, the specific model or type of GPU is not mentioned. |
| Software Dependencies | No | The paper mentions using an “Adam optimizer” and a “pretrained XLSR-53 model” but does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers. |
| Experiment Setup | Yes | Specifically, we train models for 100k iterations on 36 GPUs using an Adam optimizer with an initial learning rate of 5e-5 and a tri-stage schedule for all modules except T . Unless stated otherwise, we set t = 0.3, α = 10, β = 5, and γ = 5, 000. |