PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

Authors: Cheng-I Jeff Lai, Yang Zhang, Alexander H. Liu, Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, Jim Glass

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on low-resource ASR verify (1) sparse subnetworks exist in mono-lingual/multi-lingual pre-trained speech SSL, and (2) the computational advantage and performance gain of PARP over baseline pruning methods.
Researcher Affiliation Collaboration 1MIT CSAIL, 2MIT-IBM Watson AI Lab, 3National Taiwan University, 4UC Santa Barbara
Pseudocode Yes Algorithm 1 Prune-Adjust-Re-Prune (PARP) to target sparsity s
Open Source Code Yes Project webpage: https://people.csail.mit.edu/clai24/parp/
Open Datasets Yes wav2vec 2.0 We took wav2vec 2.0 base (wav2vec2-base) and large (wav2vec2-large) pre-trained on Librispeech 960 hours [6].
Dataset Splits Yes Our experimental setup can be found in Appendix 9. For Librispeech, we use the splits provided by wav2vec 2.0 [6]: 10min, 1h, 10h for low-resource finetuning, and dev-other, dev-clean, test-other, test-clean for evaluation.
Hardware Specification Yes We thank IBM for the donation to MIT of the Satori GPU cluster, and John Cohn for maintaining the cluster.
Software Dependencies No The paper mentions using fairseq and wav2vec 2.0 (implicitly software), but does not provide specific version numbers for these or any other software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes All models are finetuned with Adam optimizer with learning rate 3e-5, weight decay 0.01, and 50k warm-up steps and then linearly decayed to 0. Batch size is 32.