Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

Authors: Chunyuan Li, Changyou Chen, David Carlson, Lawrence Carin

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also provide empirical results for Logistic Regression, Feedforward Neural Nets, and Convolutional Neural Nets, demonstrating that our preconditioned SGLD method gives state-of-the-art performance on these models.
Researcher Affiliation Academia Chunyuan Li1, Changyou Chen1 , David Carlson2 and Lawrence Carin1 1Department of Electrical and Computer Engineering, Duke University 2Department of Statistics and Grossman Center, Columbia University
Pseudocode Yes Algorithm 1 Preconditioned SGLD with RMSprop
Open Source Code No The paper mentions 'Appendix is at https://sites.google.com/site/chunyuan24' but does not explicitly state that the source code for the methodology is provided there. It also does not provide a direct link to a code repository.
Open Datasets Yes We demonstrate p SGLD on BLR. A small Australian dataset (Girolami and Calderhead 2011) is first used...We then test BLR on a large-scale Adult dataset, a9a (Lin, Weng, and Keerthi 2008)...We test on MNIST dataset, consisting of 28 28 images from 10 classes with 60, 000 training and 10, 000 test samples.
Dataset Splits No For the MNIST dataset, the paper states '60, 000 training and 10, 000 test samples' but does not explicitly specify a validation split or methodology for creating one. It also doesn't provide general splitting information for other datasets beyond their total size.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes If not specifically mentioned, the default setting for DNN experiments is shared as follows. σ2 = 1, minibatch size is 100, thinning interval is 100, burn-in is 300. We employ a block decay strategy for stepsize; it decreases by half after every L epochs. For BLR on Australian dataset: 'minibatch size of 5, σ2 = 100'. For BLR on Adult dataset: 'Minibatch size is set to 50, σ2 = 10. The thinning interval is 50, burn-in is 500, and T = 1.5 104. Stepsize ϵ = 5 10 2'. For FNN: 'set the optimal stepsize for each algorithm as: for p SGLD and RSMprop as ϵ=5 10 4, while for SGLD and SGD as ϵ= 5 10 1'. For CNN: 'Both convolutional layers use 5 5 filter size with 32 and 64 channels, respectively; 2 2 max pooling is used after each convolutional layer. The fully-connected layers have 200-200 hidden nodes with Re LU, 20 epochs are used, and L is set to 10.'