Soft Weight-Sharing for Neural Network Compression
Authors: Karen Ullrich, Edward Meeds, Max Welling
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our compression procedure on two neural network models used in previous work we compare against in our experiments: (a) Le Net-300-100 an MNIST model... (b) Le Net-5-Caffe a modified version of the Le Net-5 MNIST model... (c) Res Nets... 6 EXPERIMENTS |
| Researcher Affiliation | Academia | Karen Ullrich University of Amsterdam karen.ullrich@uva.nl Edward Meeds University of Amsterdam tmeeds@gmail.com Max Welling University of Amsterdam Canadian Institute for Advanced Research (CIFAR) welling.max@gmail.com |
| Pseudocode | Yes | A summary can be found in Algorithm 1. Algorithm 1 Soft weight-sharing for compression, our proposed algorithm for neural network model compression. It is divided into two main steps: network re-training and post-processing. |
| Open Source Code | Yes | ACKNOWLEDGEMENTS We would like to thank Louis Smit, Christos Louizos, Thomas Kipf, Rianne van den Berg and Peter O Connor for helpful discussions on the paper and the public code3. 3https://github.com/Karen Ullrich/Tutorial-Soft Weight Sharing For NNCompression |
| Open Datasets | Yes | (a) Le Net-300-100 an MNIST model described in Le Cun et al. (1998). ... (b) Le Net-5-Caffe a modified version of the Le Net-5 MNIST model in Le Cun et al. (1998). ... for CIFAR-10 and CIFAR-100 respectively. |
| Dataset Splits | No | The paper mentions using standard datasets like MNIST and CIFAR-10 but does not explicitly provide details on how the training, validation, and test sets were split (e.g., percentages, sample counts, or specific split files). |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for its experiments, such as specific GPU models, CPU models, or cloud computing instance types. |
| Software Dependencies | No | The paper mentions 'Adam (Kingma & Ba, 2014)' and 'Caffe MNIST tutorial page' but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | Note that, similar to (Nowlan & Hinton, 1992) we weigh the log-prior contribution to the gradient by a factor of τ = 0.005. ... The remaining learning rates are set to 5 10 4. ... Our Gaussian MM prior is initialized with 24 + 1 = 17 components. We initialize the learning rate for the weights and means, log-variances and log-mixing proportions separately. ... For one component, we fix µj=0 = 0 and πj=0 = 0.999. ... We distribute the means of the 16 non-fixed components evenly over the range of the pre-trained weights. The variances will be initialized such that each Gaussian has significant probability mass in its region. ... The trainable mixing proportions are initialized evenly πj = (1 πj=0)/J. |