DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures
Authors: Huanrui Yang, Wei Wen, Hai Li
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that enforcing Deep Hoyer regularizers can produce even sparser neural network models than previous works, under the same accuracy level. We also show that Deep Hoyer can be applied to both element-wise and structural pruning. |
| Researcher Affiliation | Academia | Huanrui Yang, Wei Wen, Hai Li Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708 {huanrui.yang, wei.wen, hai.li}@duke.edu |
| Pseudocode | No | The paper describes the proposed regularizers and their gradients using mathematical equations and textual descriptions, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The codes are available at https://github.com/yanghr/Deep Hoyer. |
| Open Datasets | Yes | The MNIST dataset (Le Cun et al., 1998) is a well known handwritten digit dataset consists of greyscale images with the size of 28 28 pixels. We use the dataset API provided in the torchvision python package to access the dataset. [...] The Image Net dataset... (Russakovsky et al., 2015), which can be found at http://www.image-net.org/challenges/LSVRC/ 2012/nonpub-downloads. [...] We also use the CIFAR-10 dataset (Krizhevsky & Hinton, 2009) to evaluate the structural pruning performance on Res Net-56 and Res Net-110 models. The CIFAR-10 dataset can be directly accessed through the dataset API provided in the torchvision python package. |
| Dataset Splits | Yes | We use all the data in the provided training set to train our model, and use the provided validation set to evaluate our model and report the testing accuracy. |
| Hardware Specification | Yes | All the MNIST experiments are done with a single TITAN XP GPU. [...] Two TITAN XP GPUs are used in parallel for the Alex Net training and four are used for the Res Net-50 training. |
| Software Dependencies | No | The paper mentions software like "Py Torch deep learning framework" and "torchvision python package" and optimizers like "Adam optimizer" and "SGD optimizer", but it does not provide specific version numbers for any of these software components. |
| Experiment Setup | Yes | Adam optimizer (Kingma & Ba, 2014) with learning rate 0.001 is used throughout the training process. All the MNIST experiments are done with a single TITAN XP GPU. [...] Detailed parameter choices used in achieving the reported results are listed in Table 6. [...] For the Res Net-50 experiments on Image Net, [...] All the models are optimized with the SGD optimizer Sutskever et al. (2013), and the batch size is chosen as 256 for all the experiments. |