Gradient-Free Structured Pruning with Unlabeled Data
Authors: Azade Nova, Hanjun Dai, Dale Schuurmans
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | An evaluation on the GLUE and SQu AD benchmarks using BERTBASE and Distil BERT illustrates the effectiveness of the proposed approach. |
| Researcher Affiliation | Collaboration | 1Google Deep Mind 2University of Alberta. |
| Pseudocode | Yes | Algorithm 1 Kernelized Convex Masking (KCM); Algorithm 2 Representative Ranking (R2) |
| Open Source Code | No | The paper does not provide concrete access to source code (e.g., a specific repository link or an explicit code release statement) for the methodology described. |
| Open Datasets | Yes | We fine-tuned the pre-trained checkpoints of the BERTBASE (Devlin et al., 2018) and Distil BERT (Sanh et al., 2019) downloaded from the Hugging Face repository on GLUE (Wang et al., 2018) and SQu AD (Rajpurkar et al., 2018; 2016) benchmarks. |
| Dataset Splits | Yes | GLUE (Wang et al., 2018) includes following tasks. 1)Sentence similarity (QQP (Shankar et al., 2017), MRPC (Dolan & Brockett, 2005), STS-B (Cer et al., 2017)) with 364K, 4k and 6k training examples. [...] SQu AD 1.1 (Rajpurkar et al., 2016) and SQu AD 2.0 (Rajpurkar et al., 2018) are question and answering tasks, each of which contains 88K and 130K training examples. [...] Table 9. Train-Test data discrepancy Unlabeled Sample Evaluation 60% 70% SQu AD1.1-train SQu AD1.1-val 76.92 0.11 82.65 0.06 |
| Hardware Specification | Yes | Table 6. Speedup of KCM on BERTBASE on a single NVIDIA V100 GPU for 60% Flops constraint: |
| Software Dependencies | No | The paper mentions 'We implemented our framework with Py Torch (Paszke et al., 2019) using the Hugging Face Transformers (Wolf et al., 2020) library.' However, it does not provide specific version numbers for PyTorch or Hugging Face Transformers, which are necessary for a reproducible description of software dependencies. |
| Experiment Setup | Yes | In our experiments, we set σ = 1.0 and α = 0.01. Moreover, on average it takes less than 20 iterations to converge. All results are averaged over the runs with 10 different seeds. |