Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

Authors: Yubei Xiao, Ke Gong, Pan Zhou, Guolin Zheng, Xiaodan Liang, Liang Lin14112-14120

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, experiment results on two multilingual datasets show significant performance improvement when applying our AMS on MML-ASR, and also demonstrate the applicability of AMS to other low-resource speech tasks and transfer learning ASR approaches.
Researcher Affiliation Collaboration 1 School of Computer Science and Engineering, Sun Yat-sen University, China 2 School of Intelligent Systems Engineering, Sun Yat-sen University, China 3 Dark Matter AI Research, 4 Sales Force
Pseudocode Yes Algorithm 1 Adversarial Meta Sampling
Open Source Code No The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets Yes Common Voice (Ardila et al. 2020) is an open-source multilingual voice dataset and contains about 40 kinds of languages. ... In addition, we also conducted experiments on the IARPA BABEL dataset (Gales et al. 2014) with 6 source languages (Bengali, Tagalog, Zulu, Turkish, Lithuanian, Guarani) and 3 target languages (Vietnamese, Swahili, Tamil).
Dataset Splits Yes Table 1: Multilingual dataset statistics in terms of hours (h). ... Target Kyrgyz-train 10 Kyrgyz-test 1 Estonian-train 9 Estonian-test 1 ... Spanish-train 10 Spanish-test 1.5 Dutch-train 10 Dutch-test 1.5 Kabyle-train 10 Kabyle-test 1.5
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions several algorithms, models, and techniques (e.g., VGG, BLSTM, Adam, BPE) but does not provide specific version numbers for software libraries or environments (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes The encoder contains a 6-layered VGG (Simonyan and Zisserman 2015) extractor and 5 BLSTM (Graves, Jaitly, and rahman Mohamed 2013; Graves and Jaitly 2014) layers, each with 320-dimensional units per direction. Location-aware attention (Chorowski et al. 2015) with 300 dimensions is used in our attention layer and the decoder is a single LSTM (Hochreiter and Schmidhuber 1997) layer with 320 dimensions. We set λctc to 0.5. ... The policy network contains a feed forward attention and a one-layer LSTM with the hidden size 100 and the input size 32. We use Adam with an initial learning rate γ = 0.035 and an entropy penalty weight 10 5 to train the policy network. We set M as 3 after searching the range M {2, 3, 4, 5, 7, 9} and set w as 48, of which 24 examples are divided into support set and 24 examples into query set.