Audio Lottery: Speech Recognition Made Ultra-Lightweight, Noise-Robust, and Transferable

Authors: Shaojin Ding, Tianlong Chen, Zhangyang Wang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted extensive experiments on CNN-LSTM, RNNTransducer, and Transformer models, and verified the existence of highly sparse winning tickets that can match the full model performance across those backbones.
Researcher Affiliation Academia Shaojin Ding1*, Tianlong Chen2*, Zhangyang Wang2 1Texas A&M University, 2 University of Texas at Austin shjd@tamu.edu, {tianlong.chen,atlaswang}@utexas.edu
Pseudocode Yes Algorithm 1 Lottery Ticket Hypothesis Pruning 1: Set the initial mask m, with the weight initialization θ. 2: repeat 3: Rewind the weight to θ 4: Train f(x; m θ) for t epochs with algorithm AD t , i.e., AD t (f(x; m θ)) 5: Prune 20% of remaining weights in AD t (f(x; m θ)) and update m accordingly 6: until the sparsity of m reaches the desired sparsity level s 7: Return f(x; m θ).
Open Source Code Yes Codes are available at https://github.com/VITA-Group/Audio-Lottery.
Open Datasets Yes We conducted experiments on three commonly used ASR corpora: TED-LIUM (Rousseau et al., 2012), Common Voice (Ardila et al., 2020), and Libri Speech (Panayotov et al., 2015).
Dataset Splits No The paper mentions test sets but does not provide specific percentages or counts for training, validation, and test splits, nor does it explicitly reference a standard split that defines these proportions.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments, only discussing computational complexity in terms of MACs.
Software Dependencies No The paper mentions using PyTorch and lists several GitHub repositories for implementation bases and pruning libraries, but does not provide specific version numbers for these software components.
Experiment Setup Yes During training, we set the batch size to 32 and an initial learning rate to 0.0003, which is annealed down by a factor of 1.1 at the end of each epoch. All the models were trained for 16 epochs.