A Three-regime Model of Network Pruning

Authors: Yefan Zhou, Yaoqing Yang, Arin Chang, Michael W. Mahoney

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results validate the effectiveness of the proposed model, demonstrating that adjusting the load and temperature parameters can lead to relatively-sharp transitions in model performance and that making decisions based on this leads to improved test error for pruned models.
Researcher Affiliation Academia 1International Computer Science Institute, CA, USA 2University of California, Berkeley, CA, USA 3Dartmouth College, NH, USA 4Lawrence Berkeley National Laboratory, CA, USA.
Pseudocode Yes Algorithm 1 Temperature Tuning, Algorithm 2 Model Selection via LMC and test error, Algorithm 2.1 Model Selection via LMC and CKA
Open Source Code Yes Our code is open-sourced.1 and footnote 1https://github.com/Yefan Zhou/Three Regime Pruning.
Open Datasets Yes For the image classification task, we consider CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and SVHN (Sermanet et al., 2011). ... For the machine translation task, we consider WMT14 (Bojar et al., 2014) German to English (DE-EN) dataset.
Dataset Splits Yes CIFAR-10 comprises 50,000 training images and 10,000 testing images with 10 categories. and report the BLEU score on the validation set. (Section B.1, Datasets)
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, or cloud instance specifications) used for running the experiments.
Software Dependencies No The paper mentions optimizers like SGD and Adam, and implicitly uses deep learning frameworks, but it does not specify version numbers for any software dependencies (e.g., 'Python 3.x', 'PyTorch 1.x', 'CUDA 11.x').
Experiment Setup Yes Model density (Load): we consider the model pruned to 9 densities: {5, 6, 7, 8, 10, 14, 20, 40, 80}%. Number of training epochs (Temperature): we consider training to numbers of epochs that are multiples of 10 {10, 20, 30, ... , 160}. Batch size (Temperature): we consider training the model with varying batch sizes {16, 21, 27, 32, 38, 44, 52, 64, 92, 128, 180, 256, 512} while keeping the same amount of training iterations. (Section 3.1) and The default hyperparameters include a momentum of 0.9, weight decay of 1e-4, and a training duration of 160 epochs. Learning rate decay is applied with an initial learning rate of 0.1, which decreases by a factor of 10 at epochs 80 and 120. (Section B.1).