PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition
Authors: Cheng-I Jeff Lai, Yang Zhang, Alexander H. Liu, Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, Jim Glass
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on low-resource ASR verify (1) sparse subnetworks exist in mono-lingual/multi-lingual pre-trained speech SSL, and (2) the computational advantage and performance gain of PARP over baseline pruning methods. |
| Researcher Affiliation | Collaboration | 1MIT CSAIL, 2MIT-IBM Watson AI Lab, 3National Taiwan University, 4UC Santa Barbara |
| Pseudocode | Yes | Algorithm 1 Prune-Adjust-Re-Prune (PARP) to target sparsity s |
| Open Source Code | Yes | Project webpage: https://people.csail.mit.edu/clai24/parp/ |
| Open Datasets | Yes | wav2vec 2.0 We took wav2vec 2.0 base (wav2vec2-base) and large (wav2vec2-large) pre-trained on Librispeech 960 hours [6]. |
| Dataset Splits | Yes | Our experimental setup can be found in Appendix 9. For Librispeech, we use the splits provided by wav2vec 2.0 [6]: 10min, 1h, 10h for low-resource finetuning, and dev-other, dev-clean, test-other, test-clean for evaluation. |
| Hardware Specification | Yes | We thank IBM for the donation to MIT of the Satori GPU cluster, and John Cohn for maintaining the cluster. |
| Software Dependencies | No | The paper mentions using fairseq and wav2vec 2.0 (implicitly software), but does not provide specific version numbers for these or any other software dependencies such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | All models are finetuned with Adam optimizer with learning rate 3e-5, weight decay 0.01, and 50k warm-up steps and then linearly decayed to 0. Batch size is 32. |