The Need for Speed: Pruning Transformers with One Recipe
Authors: Samir Khaki, Konstantinos N Plataniotis
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | produce state-of-the-art results on natural language, image classification, transfer learning, and semantic segmentation tasks. Our motivation stems from the need for a generalizable model compression framework that scales well across different transformer architectures and applications. Given a FLOP constraint, the OPTIN framework will compress the network while maintaining competitive accuracy performance and improved throughput. Particularly, we show a 2% accuracy degradation from NLP baselines and a 0.5% improvement from stateof-the-art methods on image classification at competitive FLOPs reductions. We further demonstrate the generalization of tasks and architecture with comparative performance on Mask2Former for semantic segmentation and cnn-style networks. |
| Researcher Affiliation | Academia | Samir Khaki , Konstantinos N. Plataniotis Department of Electrical and Computer Engineering University of Toronto Toronto, Canada samir.khaki@mail.utoronto.ca |
| Pseudocode | Yes | Algorithm 1 OPTIN Framework for Model Compression |
| Open Source Code | Yes | Code is available at: https://github.com/Skhaki18/optin-transformer-pruning. |
| Open Datasets | Yes | For Natural Language Processing, OPTIN is evaluated on the GLUE Benchmark (Wang et al., 2019)... For Image Classification, both Image Net1-K (Deng et al., 2009) and CIFAR10 (Krizhevsky et al., 2009)... For Semantic Segmentation, the Cityscapes Dataset (Cordts et al., 2016)... |
| Dataset Splits | No | The paper mentions training and validation images/data for some datasets (e.g., ImageNet-1K, CIFAR-10) and refers to validation error, but does not explicitly state the specific train/validation/test splits (e.g., percentages or exact counts for all splits) needed to reproduce the experiment's data partitioning. |
| Hardware Specification | Yes | All time measurements are captured over 300 iterations on an Nvidia RTX 2080 using a 100-iteration warmup. |
| Software Dependencies | No | We implement our method using transformers from the Hugging Face Library (Wolf et al., 2020) and infrastructure from Py Torch (Paszke et al., 2019). The paper names the software used but does not provide specific version numbers for reproducibility. |
| Experiment Setup | Yes | All time measurements are captured over 300 iterations on an Nvidia RTX 2080 using a 100-iteration warmup. The amount (batch) of data used to compute the scores is ablated in Appendix A.6. In Tab 1d, we use the λ sweep to express relative magnitude differences between LMD and LKD. |