reproducibilityindex.ai

Bayesian Compression for Deep Learning

Authors: Christos Louizos, Karen Ullrich, Max Welling

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validated the compression and speed-up capabilities of our models on the well-known architectures of Le Net-300-100 [39], Le Net-5-Caffe9 on MNIST [40] and, similarly with [49], VGG [61]10 on CIFAR 10 [36].
Researcher Affiliation	Collaboration	Christos Louizos University of Amsterdam TNO Intelligent Imaging c.louizos@uva.nl Karen Ullrich University of Amsterdam k.ullrich@uva.nl Max Welling University of Amsterdam CIFAR m.welling@uva.nl
Pseudocode	Yes	We provide the algorithms that describe the forward pass using local reparametrizations for fully connected and convolutional layers with each of the employed approximate posteriors at appendix F.
Open Source Code	No	The paper does not provide an explicit statement about the release of its source code or a link to a repository.
Open Datasets	Yes	We validated the compression and speed-up capabilities of our models on the well-known architectures of Le Net-300-100 [39], Le Net-5-Caffe9 on MNIST [40] and, similarly with [49], VGG [61]10 on CIFAR 10 [36].
Dataset Splits	No	The paper uses well-known datasets like MNIST and CIFAR-10, which have standard splits, but it does not explicitly state the training, validation, or test dataset splits (e.g., percentages or counts) within the main body of the paper.
Hardware Specification	Yes	all experiments were run with Tensorﬂow 1.0.1, cuda 8.0 and respective cu DNN. We apply 16 CPUs run in parallel (CPU) or a Titan X (GPU).
Software Dependencies	Yes	all experiments were run with Tensorﬂow 1.0.1, cuda 8.0 and respective cu DNN.
Experiment Setup	Yes	For the horseshoe prior we set the scale τ0 of the global half-Cauchy prior to a reasonably small value, e.g. τ0 = 1e 5. This further increases the prior mass at zero, which is essential for sparse estimation and compression. We also found that constraining the standard deviations as described at [44] and warm-up" [62] helps in avoiding bad local optima of the variational objective. Further details about the experimental setup can be found at Appendix A. After initialization we trained the VGG network regularly for 200 epochs using Adam with the default hyperparameters.