reproducibilityindex.ai

Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models

Authors: Seungcheol Park, Hojun Choi, U Kang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive experiments on GLUE and SQu AD benchmarks to demonstrate the performance of K-prune.
Researcher Affiliation	Academia	1Seoul National University, Seoul, South Korea 2Kim Jaechul Graduate School of AI, KAIST, Seoul, South Korea
Pseudocode	Yes	Algorithm 1 Knowledge-Preserving Mask Search (KPMS)
Open Source Code	Yes	Our source code is available at https://github.com/snudm-starlab/K-prune
Open Datasets	Yes	We evaluate the performance of compressing the pretrained BERT (Devlin et al., 2019) and Distil BERT (Sanh et al., 2019) models on GLUE (Wang et al., 2019), SQu AD v1.1 (Rajpurkar et al., 2016), and v2 (Rajpurkar et al., 2018) under diverse compression rates.
Dataset Splits	No	The paper uses well-known benchmark datasets like GLUE and SQuAD, which have predefined splits, but does not explicitly state the specific train/validation/test splits used for the experiments for reproducibility. It mentions using '100K tokens from the training dataset as a sample dataset' for the K-prune process, but this is not presented as a general validation split.
Hardware Specification	Yes	We use NVIDIA 1080 Ti for all experiments.
Software Dependencies	No	The paper states 'We use Py Torch (Paszke et al., 2019), and the weights of the pretrained models in Transformers (Wolf et al., 2020)' and 'We use a linear solver2 in Py Torch (Paszke et al., 2019) to solve Equations (10) and (11)', but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We use 100K tokens from the training dataset as a sample dataset, and exploit the pretrained tokenizers in Transformers (Wolf et al., 2020) for counting. The size of the sample dataset is small compared to the GLUE and SQu AD datasets, e.g. around 0.64% of MNLI (Williams et al., 2018) dataset. We fix random seeds from 0 to 4 and report the average performance of the 5 runs. We use two combinations of hyperparameters (γ, λ, µ) {(2, 0, 64), (2, 0.00025, 64)} for all experiments of K-prune.