Movement Pruning: Adaptive Sparsity by Fine-Tuning
Authors: Victor Sanh, Thomas Wolf, Alexander Rush
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes. |
| Researcher Affiliation | Collaboration | 1Hugging Face, 2Cornell University {victor,thomas}@huggingface.co ; arush@cornell.edu |
| Pseudocode | No | The paper describes its methods through textual descriptions and mathematical equations, but it does not include a structured pseudocode block or an algorithm labeled as such. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing the source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We perform experiments on three monolingual (English) tasks...: question answering (SQu AD v1.1) [Rajpurkar et al., 2016], natural language inference (MNLI) [Williams et al., 2018], and sentence similarity (QQP) [Iyer et al., 2017]. |
| Dataset Splits | No | The datasets respectively contain 8K, 393K, and 364K training examples.' and mentions 'Dev acc/MM acc' in tables, but does not provide explicit training/validation/test splits (e.g., percentages or sample counts for each split) or specific citations for their exact predefined splits that allow reproduction of data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or server configurations) used to conduct the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies or their version numbers (e.g., specific libraries, frameworks, or programming language versions) used for the experiments. |
| Experiment Setup | Yes | For a given task, we fine-tune the pre-trained model for the same number of updates (between 6 and 10 epochs) across pruning methods. We follow Zhu and Gupta [2018] and use a cubic sparsity scheduling for Magnitude Pruning (Ma P), Movement Pruning (Mv P), and Soft Movement Pruning (SMv P). |