DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures

Authors: Huanrui Yang, Wei Wen, Hai Li

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that enforcing Deep Hoyer regularizers can produce even sparser neural network models than previous works, under the same accuracy level. We also show that Deep Hoyer can be applied to both element-wise and structural pruning.
Researcher Affiliation Academia Huanrui Yang, Wei Wen, Hai Li Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708 {huanrui.yang, wei.wen, hai.li}@duke.edu
Pseudocode No The paper describes the proposed regularizers and their gradients using mathematical equations and textual descriptions, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The codes are available at https://github.com/yanghr/Deep Hoyer.
Open Datasets Yes The MNIST dataset (Le Cun et al., 1998) is a well known handwritten digit dataset consists of greyscale images with the size of 28 28 pixels. We use the dataset API provided in the torchvision python package to access the dataset. [...] The Image Net dataset... (Russakovsky et al., 2015), which can be found at http://www.image-net.org/challenges/LSVRC/ 2012/nonpub-downloads. [...] We also use the CIFAR-10 dataset (Krizhevsky & Hinton, 2009) to evaluate the structural pruning performance on Res Net-56 and Res Net-110 models. The CIFAR-10 dataset can be directly accessed through the dataset API provided in the torchvision python package.
Dataset Splits Yes We use all the data in the provided training set to train our model, and use the provided validation set to evaluate our model and report the testing accuracy.
Hardware Specification Yes All the MNIST experiments are done with a single TITAN XP GPU. [...] Two TITAN XP GPUs are used in parallel for the Alex Net training and four are used for the Res Net-50 training.
Software Dependencies No The paper mentions software like "Py Torch deep learning framework" and "torchvision python package" and optimizers like "Adam optimizer" and "SGD optimizer", but it does not provide specific version numbers for any of these software components.
Experiment Setup Yes Adam optimizer (Kingma & Ba, 2014) with learning rate 0.001 is used throughout the training process. All the MNIST experiments are done with a single TITAN XP GPU. [...] Detailed parameter choices used in achieving the reported results are listed in Table 6. [...] For the Res Net-50 experiments on Image Net, [...] All the models are optimized with the SGD optimizer Sutskever et al. (2013), and the batch size is chosen as 256 for all the experiments.