Directional Pruning of Deep Neural Networks
Authors: Shih-Kang Chao, Zhanyu Wang, Yue Xing, Guang Cheng
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical results demonstrate the promising results of our solution in highly sparse regime (92% sparsity) among many existing pruning methods on the Res Net50 with the Image Net, while using only a slightly higher wall time and memory footprint than the SGD. This section presents the empirical performance of (g RDA), and the evidence that (g RDA) performs the directional pruning (Definition 1.1). Section 4.1 considers Res Net50 with Image Net, and compares with several existing pruning algorithms. |
| Researcher Affiliation | Academia | Shih Kang Chao Department of Statistics University of Missouri Columbia, MO 65211 chaosh@missouri.edu Zhanyu Wang Department of Statistics Purdue University West Lafayette, IN 47907 wang4094@purdue.edu Yue Xing Department of Statistics Purdue University West Lafayette, IN 47907 xing49@purdue.edu Guang Cheng Department of Statistics Purdue University West Lafayette, IN 47907 chengg@purdue.edu |
| Pseudocode | Yes | See Section C.1 and C.2 in the appendix for the algorithms in pseudocode. |
| Open Source Code | Yes | The code that reproduces the results of this paper is available at https://github.com/ donlan2710/g RDA-Optimizer/tree/master/directional_pruning. |
| Open Datasets | Yes | We use (g RDA) to simultaneously prune and train the Res Net50 [31] on the Image Net dataset without any post-processing like retraining. We train VGG16 [55] on CIFAR-10 and WRN28x10 on CIFAR-100 until nearly zero training loss using both (SGD) and (g RDA). |
| Dataset Splits | No | The paper does not provide explicit details about training/validation/test dataset splits (e.g., percentages, sample counts, or specific splitting methodology for validation). It refers to 'training data' and 'testing accuracy' but lacks specific validation split information. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU or GPU models, memory specifications, or cloud instance types) used for running the experiments. It only discusses devices in general terms. |
| Software Dependencies | No | The paper mentions deep learning frameworks like 'Tensorflow or Py Torch' but does not provide specific version numbers for these or any other ancillary software components, which are required for a reproducible description. |
| Experiment Setup | Yes | The learning rate schedule usually applied jointly with the SGD with momentum does not work well for (g RDA), so we use either a constant learning rate or dropping the learning rate only once in the later training stage. Please find more implementation details in Section C.1 in the appendix. γ = 0.1 for both SGD and g RDA. Minibatch size is 256. For a given µ, we recommend to search for the greatest c (starting with e.g. 10 4) such that g RDA yields a comparable test acc. as SGD using 1 5 epochs. |