A Three-regime Model of Network Pruning
Authors: Yefan Zhou, Yaoqing Yang, Arin Chang, Michael W. Mahoney
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results validate the effectiveness of the proposed model, demonstrating that adjusting the load and temperature parameters can lead to relatively-sharp transitions in model performance and that making decisions based on this leads to improved test error for pruned models. |
| Researcher Affiliation | Academia | 1International Computer Science Institute, CA, USA 2University of California, Berkeley, CA, USA 3Dartmouth College, NH, USA 4Lawrence Berkeley National Laboratory, CA, USA. |
| Pseudocode | Yes | Algorithm 1 Temperature Tuning, Algorithm 2 Model Selection via LMC and test error, Algorithm 2.1 Model Selection via LMC and CKA |
| Open Source Code | Yes | Our code is open-sourced.1 and footnote 1https://github.com/Yefan Zhou/Three Regime Pruning. |
| Open Datasets | Yes | For the image classification task, we consider CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and SVHN (Sermanet et al., 2011). ... For the machine translation task, we consider WMT14 (Bojar et al., 2014) German to English (DE-EN) dataset. |
| Dataset Splits | Yes | CIFAR-10 comprises 50,000 training images and 10,000 testing images with 10 categories. and report the BLEU score on the validation set. (Section B.1, Datasets) |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, or cloud instance specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimizers like SGD and Adam, and implicitly uses deep learning frameworks, but it does not specify version numbers for any software dependencies (e.g., 'Python 3.x', 'PyTorch 1.x', 'CUDA 11.x'). |
| Experiment Setup | Yes | Model density (Load): we consider the model pruned to 9 densities: {5, 6, 7, 8, 10, 14, 20, 40, 80}%. Number of training epochs (Temperature): we consider training to numbers of epochs that are multiples of 10 {10, 20, 30, ... , 160}. Batch size (Temperature): we consider training the model with varying batch sizes {16, 21, 27, 32, 38, 44, 52, 64, 92, 128, 180, 256, 512} while keeping the same amount of training iterations. (Section 3.1) and The default hyperparameters include a momentum of 0.9, weight decay of 1e-4, and a training duration of 160 epochs. Learning rate decay is applied with an initial learning rate of 0.1, which decreases by a factor of 10 at epochs 80 and 120. (Section B.1). |