Gradient-Free Structured Pruning with Unlabeled Data

Authors: Azade Nova, Hanjun Dai, Dale Schuurmans

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An evaluation on the GLUE and SQu AD benchmarks using BERTBASE and Distil BERT illustrates the effectiveness of the proposed approach.
Researcher Affiliation Collaboration 1Google Deep Mind 2University of Alberta.
Pseudocode Yes Algorithm 1 Kernelized Convex Masking (KCM); Algorithm 2 Representative Ranking (R2)
Open Source Code No The paper does not provide concrete access to source code (e.g., a specific repository link or an explicit code release statement) for the methodology described.
Open Datasets Yes We fine-tuned the pre-trained checkpoints of the BERTBASE (Devlin et al., 2018) and Distil BERT (Sanh et al., 2019) downloaded from the Hugging Face repository on GLUE (Wang et al., 2018) and SQu AD (Rajpurkar et al., 2018; 2016) benchmarks.
Dataset Splits Yes GLUE (Wang et al., 2018) includes following tasks. 1)Sentence similarity (QQP (Shankar et al., 2017), MRPC (Dolan & Brockett, 2005), STS-B (Cer et al., 2017)) with 364K, 4k and 6k training examples. [...] SQu AD 1.1 (Rajpurkar et al., 2016) and SQu AD 2.0 (Rajpurkar et al., 2018) are question and answering tasks, each of which contains 88K and 130K training examples. [...] Table 9. Train-Test data discrepancy Unlabeled Sample Evaluation 60% 70% SQu AD1.1-train SQu AD1.1-val 76.92 0.11 82.65 0.06
Hardware Specification Yes Table 6. Speedup of KCM on BERTBASE on a single NVIDIA V100 GPU for 60% Flops constraint:
Software Dependencies No The paper mentions 'We implemented our framework with Py Torch (Paszke et al., 2019) using the Hugging Face Transformers (Wolf et al., 2020) library.' However, it does not provide specific version numbers for PyTorch or Hugging Face Transformers, which are necessary for a reproducible description of software dependencies.
Experiment Setup Yes In our experiments, we set σ = 1.0 and α = 0.01. Moreover, on average it takes less than 20 iterations to converge. All results are averaged over the runs with 10 different seeds.