Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition
Authors: Yubei Xiao, Ke Gong, Pan Zhou, Guolin Zheng, Xiaodan Liang, Liang Lin14112-14120
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, experiment results on two multilingual datasets show significant performance improvement when applying our AMS on MML-ASR, and also demonstrate the applicability of AMS to other low-resource speech tasks and transfer learning ASR approaches. |
| Researcher Affiliation | Collaboration | 1 School of Computer Science and Engineering, Sun Yat-sen University, China 2 School of Intelligent Systems Engineering, Sun Yat-sen University, China 3 Dark Matter AI Research, 4 Sales Force |
| Pseudocode | Yes | Algorithm 1 Adversarial Meta Sampling |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper. |
| Open Datasets | Yes | Common Voice (Ardila et al. 2020) is an open-source multilingual voice dataset and contains about 40 kinds of languages. ... In addition, we also conducted experiments on the IARPA BABEL dataset (Gales et al. 2014) with 6 source languages (Bengali, Tagalog, Zulu, Turkish, Lithuanian, Guarani) and 3 target languages (Vietnamese, Swahili, Tamil). |
| Dataset Splits | Yes | Table 1: Multilingual dataset statistics in terms of hours (h). ... Target Kyrgyz-train 10 Kyrgyz-test 1 Estonian-train 9 Estonian-test 1 ... Spanish-train 10 Spanish-test 1.5 Dutch-train 10 Dutch-test 1.5 Kabyle-train 10 Kabyle-test 1.5 |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions several algorithms, models, and techniques (e.g., VGG, BLSTM, Adam, BPE) but does not provide specific version numbers for software libraries or environments (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The encoder contains a 6-layered VGG (Simonyan and Zisserman 2015) extractor and 5 BLSTM (Graves, Jaitly, and rahman Mohamed 2013; Graves and Jaitly 2014) layers, each with 320-dimensional units per direction. Location-aware attention (Chorowski et al. 2015) with 300 dimensions is used in our attention layer and the decoder is a single LSTM (Hochreiter and Schmidhuber 1997) layer with 320 dimensions. We set λctc to 0.5. ... The policy network contains a feed forward attention and a one-layer LSTM with the hidden size 100 and the input size 32. We use Adam with an initial learning rate γ = 0.035 and an entropy penalty weight 10 5 to train the policy network. We set M as 3 after searching the range M {2, 3, 4, 5, 7, 9} and set w as 48, of which 24 examples are divided into support set and 24 examples into query set. |