reproducibilityindex.ai

A Fast Post-Training Pruning Framework for Transformers

Authors: Woosuk Kwon, Sehoon Kim, Michael W. Mahoney, Joseph Hassoun, Kurt Keutzer, Amir Gholami

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our method to BERTBASE and Distil BERT, and we evaluate its effectiveness on GLUE and SQu AD benchmarks. Our framework achieves up to 2.0 reduction in FLOPs and 1.56 speedup in inference latency, while maintaining < 1% loss in accuracy. We extensively test our framework by applying it to BERTBASE and Distil BERT on GLUE and SQu AD tasks (Section 5.2).
Researcher Affiliation	Collaboration	Woosuk Kwon UC Berkeley woosuk.kwon@berkeley.edu, Joseph Hassoun Samsung Semiconductor, Inc. j.hassoun@samsung.com
Pseudocode	Yes	Algorithm 1 Mask Search with a FLOPs Constraint
Open Source Code	Yes	Our code is publicly available at https://github.com/WoosukKwon/retraining-free-pruning
Open Datasets	Yes	We evaluate the effectiveness of our approach using BERTBASE [12] and Distil BERT [63] on GLUE [78] and SQu AD [60, 61] benchmarks. We use 2K examples from the training sets for pruning, and we evaluate the resulting models on the development sets.
Dataset Splits	Yes	We use 2K examples from the training sets for pruning, and we evaluate the resulting models on the development sets.
Hardware Specification	Yes	With batch size of 256, we achieve speedup of 1.47 on average and up to 1.56 on an NVIDIA V100 GPU. For all experiments, we used an AWS p3.2xlarge instance which has 1 NVIDIA V100 GPU.
Software Dependencies	No	The paper states that the framework is implemented on 'Py Torch [57] and the Hugging Face Transformers [86] library,' but it does not specify any version numbers for these software dependencies.
Experiment Setup	Yes	We use 2K examples from the training sets for pruning, and we evaluate the resulting models on the development sets. All of the results are averaged over the runs with 10 different seeds. Our method has only two hyperparameters which were ﬁxed in all of our experiments (See Section 4.3). ... Concretely, we re-parameterize the least squares problem as arg minrl \|\|Arl + A 1 b\|\|2 2 where ml = 1 + rl, and solve it with the damp value ﬁxed to 1. ... In all of our experiments, we ﬁxed the two hyperparameter values as we mentioned here.