Neuron-level Structured Pruning using Polarization Regularizer
Authors: Tao Zhuang, Zhixuan Zhang, Yuheng Huang, Xiaoyi Zeng, Kai Shuang, Xiang Li
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we carried out extensive experiments to evaluate our approach on the image classification task, which is the most widely-used task to test pruning methods. We presents the our main experiment results in Section 4.3. In Section 4.4, we discuss the effect of hyper-parameters in our approach. |
| Researcher Affiliation | Collaboration | 1Alibaba Group 2Beijing University of Posts and Telecommunications |
| Pseudocode | Yes | Algorithm 1 Search for hyper-parameters of Polarization Pruning |
| Open Source Code | Yes | Our code is available at https://github.com/polarizationpruning/Polarization Pruning. |
| Open Datasets | Yes | Therefore, we evaluate our method on both small datasets (CIFAR 10/100 [Krizhevsky et al., 2009]) and a large dataset (Image Net [Russakovsky et al., 2015]). |
| Dataset Splits | No | The paper mentions using CIFAR and ImageNet datasets but does not explicitly provide the specific training, validation, and test splits (e.g., percentages or sample counts) within the provided text. It refers to Appendix B for detailed parameters. |
| Hardware Specification | No | The paper mentions running experiments on "common GPU/CPU devices" but does not provide any specific hardware details such as GPU models, CPU models, or memory specifications. |
| Software Dependencies | No | The paper states: "Specifically, our code implementation is based on Py Torch and Torchvision [Paszke et al., 2019]." While it names the software, it does not provide specific version numbers for either PyTorch or Torchvision. |
| Experiment Setup | Yes | We adjust the hyper-parameters λ and t in our polarization regularizer to control the reduced FLOPs as described in Section 3.3. All scaling factors are initialized to be 0.5 in image classification tasks as in [Liu et al., 2017]. We set the upper bound of scaling factor a to 1 in this paper. For each group of hyper-parameters we only need to train 30 epochs on Image Net dataset for the FLOPs reduction to be stable, rather than the 120 epochs in full training. Algorithm 1 (Search for hyper-parameters of Polarization Pruning) also outlines a key part of the experimental setup. |