Bayesian Compression for Deep Learning
Authors: Christos Louizos, Karen Ullrich, Max Welling
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validated the compression and speed-up capabilities of our models on the well-known architectures of Le Net-300-100 [39], Le Net-5-Caffe9 on MNIST [40] and, similarly with [49], VGG [61]10 on CIFAR 10 [36]. |
| Researcher Affiliation | Collaboration | Christos Louizos University of Amsterdam TNO Intelligent Imaging c.louizos@uva.nl Karen Ullrich University of Amsterdam k.ullrich@uva.nl Max Welling University of Amsterdam CIFAR m.welling@uva.nl |
| Pseudocode | Yes | We provide the algorithms that describe the forward pass using local reparametrizations for fully connected and convolutional layers with each of the employed approximate posteriors at appendix F. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of its source code or a link to a repository. |
| Open Datasets | Yes | We validated the compression and speed-up capabilities of our models on the well-known architectures of Le Net-300-100 [39], Le Net-5-Caffe9 on MNIST [40] and, similarly with [49], VGG [61]10 on CIFAR 10 [36]. |
| Dataset Splits | No | The paper uses well-known datasets like MNIST and CIFAR-10, which have standard splits, but it does not explicitly state the training, validation, or test dataset splits (e.g., percentages or counts) within the main body of the paper. |
| Hardware Specification | Yes | all experiments were run with Tensorflow 1.0.1, cuda 8.0 and respective cu DNN. We apply 16 CPUs run in parallel (CPU) or a Titan X (GPU). |
| Software Dependencies | Yes | all experiments were run with Tensorflow 1.0.1, cuda 8.0 and respective cu DNN. |
| Experiment Setup | Yes | For the horseshoe prior we set the scale τ0 of the global half-Cauchy prior to a reasonably small value, e.g. τ0 = 1e 5. This further increases the prior mass at zero, which is essential for sparse estimation and compression. We also found that constraining the standard deviations as described at [44] and warm-up" [62] helps in avoiding bad local optima of the variational objective. Further details about the experimental setup can be found at Appendix A. After initialization we trained the VGG network regularly for 200 epochs using Adam with the default hyperparameters. |