reproducibilityindex.ai

Soft Weight-Sharing for Neural Network Compression

Authors: Karen Ullrich, Edward Meeds, Max Welling

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our compression procedure on two neural network models used in previous work we compare against in our experiments: (a) Le Net-300-100 an MNIST model... (b) Le Net-5-Caffe a modiﬁed version of the Le Net-5 MNIST model... (c) Res Nets... 6 EXPERIMENTS
Researcher Affiliation	Academia	Karen Ullrich University of Amsterdam karen.ullrich@uva.nl Edward Meeds University of Amsterdam tmeeds@gmail.com Max Welling University of Amsterdam Canadian Institute for Advanced Research (CIFAR) welling.max@gmail.com
Pseudocode	Yes	A summary can be found in Algorithm 1. Algorithm 1 Soft weight-sharing for compression, our proposed algorithm for neural network model compression. It is divided into two main steps: network re-training and post-processing.
Open Source Code	Yes	ACKNOWLEDGEMENTS We would like to thank Louis Smit, Christos Louizos, Thomas Kipf, Rianne van den Berg and Peter O Connor for helpful discussions on the paper and the public code3. 3https://github.com/Karen Ullrich/Tutorial-Soft Weight Sharing For NNCompression
Open Datasets	Yes	(a) Le Net-300-100 an MNIST model described in Le Cun et al. (1998). ... (b) Le Net-5-Caffe a modiﬁed version of the Le Net-5 MNIST model in Le Cun et al. (1998). ... for CIFAR-10 and CIFAR-100 respectively.
Dataset Splits	No	The paper mentions using standard datasets like MNIST and CIFAR-10 but does not explicitly provide details on how the training, validation, and test sets were split (e.g., percentages, sample counts, or specific split files).
Hardware Specification	No	The paper does not explicitly describe the hardware used for its experiments, such as specific GPU models, CPU models, or cloud computing instance types.
Software Dependencies	No	The paper mentions 'Adam (Kingma & Ba, 2014)' and 'Caffe MNIST tutorial page' but does not specify version numbers for any software dependencies.
Experiment Setup	Yes	Note that, similar to (Nowlan & Hinton, 1992) we weigh the log-prior contribution to the gradient by a factor of τ = 0.005. ... The remaining learning rates are set to 5 10 4. ... Our Gaussian MM prior is initialized with 24 + 1 = 17 components. We initialize the learning rate for the weights and means, log-variances and log-mixing proportions separately. ... For one component, we ﬁx µj=0 = 0 and πj=0 = 0.999. ... We distribute the means of the 16 non-ﬁxed components evenly over the range of the pre-trained weights. The variances will be initialized such that each Gaussian has signiﬁcant probability mass in its region. ... The trainable mixing proportions are initialized evenly πj = (1 πj=0)/J.