reproducibilityindex.ai

Directional Pruning of Deep Neural Networks

Authors: Shih-Kang Chao, Zhanyu Wang, Yue Xing, Guang Cheng

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The empirical results demonstrate the promising results of our solution in highly sparse regime (92% sparsity) among many existing pruning methods on the Res Net50 with the Image Net, while using only a slightly higher wall time and memory footprint than the SGD. This section presents the empirical performance of (g RDA), and the evidence that (g RDA) performs the directional pruning (Deﬁnition 1.1). Section 4.1 considers Res Net50 with Image Net, and compares with several existing pruning algorithms.
Researcher Affiliation	Academia	Shih Kang Chao Department of Statistics University of Missouri Columbia, MO 65211 chaosh@missouri.edu Zhanyu Wang Department of Statistics Purdue University West Lafayette, IN 47907 wang4094@purdue.edu Yue Xing Department of Statistics Purdue University West Lafayette, IN 47907 xing49@purdue.edu Guang Cheng Department of Statistics Purdue University West Lafayette, IN 47907 chengg@purdue.edu
Pseudocode	Yes	See Section C.1 and C.2 in the appendix for the algorithms in pseudocode.
Open Source Code	Yes	The code that reproduces the results of this paper is available at https://github.com/ donlan2710/g RDA-Optimizer/tree/master/directional_pruning.
Open Datasets	Yes	We use (g RDA) to simultaneously prune and train the Res Net50 [31] on the Image Net dataset without any post-processing like retraining. We train VGG16 [55] on CIFAR-10 and WRN28x10 on CIFAR-100 until nearly zero training loss using both (SGD) and (g RDA).
Dataset Splits	No	The paper does not provide explicit details about training/validation/test dataset splits (e.g., percentages, sample counts, or specific splitting methodology for validation). It refers to 'training data' and 'testing accuracy' but lacks specific validation split information.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU or GPU models, memory specifications, or cloud instance types) used for running the experiments. It only discusses devices in general terms.
Software Dependencies	No	The paper mentions deep learning frameworks like 'Tensorﬂow or Py Torch' but does not provide specific version numbers for these or any other ancillary software components, which are required for a reproducible description.
Experiment Setup	Yes	The learning rate schedule usually applied jointly with the SGD with momentum does not work well for (g RDA), so we use either a constant learning rate or dropping the learning rate only once in the later training stage. Please ﬁnd more implementation details in Section C.1 in the appendix. γ = 0.1 for both SGD and g RDA. Minibatch size is 256. For a given µ, we recommend to search for the greatest c (starting with e.g. 10 4) such that g RDA yields a comparable test acc. as SGD using 1 5 epochs.