An Adaptive Empirical Bayesian Method for Sparse Deep Learning
Authors: Wei Deng, Xiao Zhang, Faming Liang, Guang Lin
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical applications of the proposed method lead to the state-of-the-art performance on MNIST and Fashion MNIST with shallow convolutional neural networks (CNN) and the state-of-the-art compression performance on CIFAR10 with Residual Networks. |
| Researcher Affiliation | Academia | Wei Deng Department of Mathematics Purdue University West Lafayette, IN 47907 deng106@purdue.edu Xiao Zhang Department of Computer Science Purdue University West Lafayette, IN 47907 zhang923@purdue.edu Faming Liang Department of Statistics Purdue University West Lafayette, IN 47907 fmliang@purdue.edu Guang Lin Departments of Mathematics, Statistics and School of Mechanical Engineering Purdue University West Lafayette, IN 47907 guanglin@purdue.edu |
| Pseudocode | Yes | Algorithm 1 SGLD-SA with SSGL priors |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | Yes | The four CNN models are tested on MNIST and Fashion MNIST (FMNIST) [Xiao et al., 2017] dataset. Our compression experiments are conducted on the CIFAR-10 dataset [Krizhevsky, 2009] with DA. |
| Dataset Splits | No | The paper mentions training and test sets but does not provide specific details on a validation set split or methodology for data partitioning across all experiments that would allow for reproduction. |
| Hardware Specification | No | The paper mentions "the GPU grant program from NVIDIA" in the acknowledgments but does not specify exact GPU models or any other specific hardware details used for running its experiments. |
| Software Dependencies | No | We implement all the algorithms in Pytorch [Paszke et al., 2017]. This mentions PyTorch but does not provide a specific version number. |
| Experiment Setup | Yes | We set the training batch size n = 1000, a, b = p and ν, λ = 1000. The hyperparameters for SGHMC-SA are set to v0 = 1, v1 = 0.1 and σ = 1 to regularize the over-fitted space. The learning rate is set to 5 10 7, and the step size is ω(k) = 1 (k + 1000) 3 4 . We use a thinning factor 500 to avoid a cumbersome system. Fixed temperature can also be powerful in escaping shallow" local traps [Zhang et al., 2017], our temperatures are set to τ = 1000 for MNIST and τ = 2500 for FMNIST. The sparse training takes 1000 epochs. The mini-batch size is 1000. The learning rate starts from 2e-9 and is divided by 10 at the 700th and 900th epoch. We set the inverse temperature τ to 1000 and multiply τ by 1.005 every epoch . |