An Adaptive Empirical Bayesian Method for Sparse Deep Learning

Authors: Wei Deng, Xiao Zhang, Faming Liang, Guang Lin

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical applications of the proposed method lead to the state-of-the-art performance on MNIST and Fashion MNIST with shallow convolutional neural networks (CNN) and the state-of-the-art compression performance on CIFAR10 with Residual Networks.
Researcher Affiliation Academia Wei Deng Department of Mathematics Purdue University West Lafayette, IN 47907 deng106@purdue.edu Xiao Zhang Department of Computer Science Purdue University West Lafayette, IN 47907 zhang923@purdue.edu Faming Liang Department of Statistics Purdue University West Lafayette, IN 47907 fmliang@purdue.edu Guang Lin Departments of Mathematics, Statistics and School of Mechanical Engineering Purdue University West Lafayette, IN 47907 guanglin@purdue.edu
Pseudocode Yes Algorithm 1 SGLD-SA with SSGL priors
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets Yes The four CNN models are tested on MNIST and Fashion MNIST (FMNIST) [Xiao et al., 2017] dataset. Our compression experiments are conducted on the CIFAR-10 dataset [Krizhevsky, 2009] with DA.
Dataset Splits No The paper mentions training and test sets but does not provide specific details on a validation set split or methodology for data partitioning across all experiments that would allow for reproduction.
Hardware Specification No The paper mentions "the GPU grant program from NVIDIA" in the acknowledgments but does not specify exact GPU models or any other specific hardware details used for running its experiments.
Software Dependencies No We implement all the algorithms in Pytorch [Paszke et al., 2017]. This mentions PyTorch but does not provide a specific version number.
Experiment Setup Yes We set the training batch size n = 1000, a, b = p and ν, λ = 1000. The hyperparameters for SGHMC-SA are set to v0 = 1, v1 = 0.1 and σ = 1 to regularize the over-fitted space. The learning rate is set to 5 10 7, and the step size is ω(k) = 1 (k + 1000) 3 4 . We use a thinning factor 500 to avoid a cumbersome system. Fixed temperature can also be powerful in escaping shallow" local traps [Zhang et al., 2017], our temperatures are set to τ = 1000 for MNIST and τ = 2500 for FMNIST. The sparse training takes 1000 epochs. The mini-batch size is 1000. The learning rate starts from 2e-9 and is divided by 10 at the 700th and 900th epoch. We set the inverse temperature τ to 1000 and multiply τ by 1.005 every epoch .