Neuron-level Structured Pruning using Polarization Regularizer

Authors: Tao Zhuang, Zhixuan Zhang, Yuheng Huang, Xiaoyi Zeng, Kai Shuang, Xiang Li

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we carried out extensive experiments to evaluate our approach on the image classification task, which is the most widely-used task to test pruning methods. We presents the our main experiment results in Section 4.3. In Section 4.4, we discuss the effect of hyper-parameters in our approach.
Researcher Affiliation Collaboration 1Alibaba Group 2Beijing University of Posts and Telecommunications
Pseudocode Yes Algorithm 1 Search for hyper-parameters of Polarization Pruning
Open Source Code Yes Our code is available at https://github.com/polarizationpruning/Polarization Pruning.
Open Datasets Yes Therefore, we evaluate our method on both small datasets (CIFAR 10/100 [Krizhevsky et al., 2009]) and a large dataset (Image Net [Russakovsky et al., 2015]).
Dataset Splits No The paper mentions using CIFAR and ImageNet datasets but does not explicitly provide the specific training, validation, and test splits (e.g., percentages or sample counts) within the provided text. It refers to Appendix B for detailed parameters.
Hardware Specification No The paper mentions running experiments on "common GPU/CPU devices" but does not provide any specific hardware details such as GPU models, CPU models, or memory specifications.
Software Dependencies No The paper states: "Specifically, our code implementation is based on Py Torch and Torchvision [Paszke et al., 2019]." While it names the software, it does not provide specific version numbers for either PyTorch or Torchvision.
Experiment Setup Yes We adjust the hyper-parameters λ and t in our polarization regularizer to control the reduced FLOPs as described in Section 3.3. All scaling factors are initialized to be 0.5 in image classification tasks as in [Liu et al., 2017]. We set the upper bound of scaling factor a to 1 in this paper. For each group of hyper-parameters we only need to train 30 epochs on Image Net dataset for the FLOPs reduction to be stable, rather than the 120 epochs in full training. Algorithm 1 (Search for hyper-parameters of Polarization Pruning) also outlines a key part of the experimental setup.