Audio Lottery: Speech Recognition Made Ultra-Lightweight, Noise-Robust, and Transferable
Authors: Shaojin Ding, Tianlong Chen, Zhangyang Wang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted extensive experiments on CNN-LSTM, RNNTransducer, and Transformer models, and verified the existence of highly sparse winning tickets that can match the full model performance across those backbones. |
| Researcher Affiliation | Academia | Shaojin Ding1*, Tianlong Chen2*, Zhangyang Wang2 1Texas A&M University, 2 University of Texas at Austin shjd@tamu.edu, {tianlong.chen,atlaswang}@utexas.edu |
| Pseudocode | Yes | Algorithm 1 Lottery Ticket Hypothesis Pruning 1: Set the initial mask m, with the weight initialization θ. 2: repeat 3: Rewind the weight to θ 4: Train f(x; m θ) for t epochs with algorithm AD t , i.e., AD t (f(x; m θ)) 5: Prune 20% of remaining weights in AD t (f(x; m θ)) and update m accordingly 6: until the sparsity of m reaches the desired sparsity level s 7: Return f(x; m θ). |
| Open Source Code | Yes | Codes are available at https://github.com/VITA-Group/Audio-Lottery. |
| Open Datasets | Yes | We conducted experiments on three commonly used ASR corpora: TED-LIUM (Rousseau et al., 2012), Common Voice (Ardila et al., 2020), and Libri Speech (Panayotov et al., 2015). |
| Dataset Splits | No | The paper mentions test sets but does not provide specific percentages or counts for training, validation, and test splits, nor does it explicitly reference a standard split that defines these proportions. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments, only discussing computational complexity in terms of MACs. |
| Software Dependencies | No | The paper mentions using PyTorch and lists several GitHub repositories for implementation bases and pruning libraries, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | During training, we set the batch size to 32 and an initial learning rate to 0.0003, which is annealed down by a factor of 1.1 at the end of each epoch. All the models were trained for 16 epochs. |