Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks
Authors: Chunyuan Li, Changyou Chen, David Carlson, Lawrence Carin
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also provide empirical results for Logistic Regression, Feedforward Neural Nets, and Convolutional Neural Nets, demonstrating that our preconditioned SGLD method gives state-of-the-art performance on these models. |
| Researcher Affiliation | Academia | Chunyuan Li1, Changyou Chen1 , David Carlson2 and Lawrence Carin1 1Department of Electrical and Computer Engineering, Duke University 2Department of Statistics and Grossman Center, Columbia University |
| Pseudocode | Yes | Algorithm 1 Preconditioned SGLD with RMSprop |
| Open Source Code | No | The paper mentions 'Appendix is at https://sites.google.com/site/chunyuan24' but does not explicitly state that the source code for the methodology is provided there. It also does not provide a direct link to a code repository. |
| Open Datasets | Yes | We demonstrate p SGLD on BLR. A small Australian dataset (Girolami and Calderhead 2011) is first used...We then test BLR on a large-scale Adult dataset, a9a (Lin, Weng, and Keerthi 2008)...We test on MNIST dataset, consisting of 28 28 images from 10 classes with 60, 000 training and 10, 000 test samples. |
| Dataset Splits | No | For the MNIST dataset, the paper states '60, 000 training and 10, 000 test samples' but does not explicitly specify a validation split or methodology for creating one. It also doesn't provide general splitting information for other datasets beyond their total size. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | If not specifically mentioned, the default setting for DNN experiments is shared as follows. σ2 = 1, minibatch size is 100, thinning interval is 100, burn-in is 300. We employ a block decay strategy for stepsize; it decreases by half after every L epochs. For BLR on Australian dataset: 'minibatch size of 5, σ2 = 100'. For BLR on Adult dataset: 'Minibatch size is set to 50, σ2 = 10. The thinning interval is 50, burn-in is 500, and T = 1.5 104. Stepsize ϵ = 5 10 2'. For FNN: 'set the optimal stepsize for each algorithm as: for p SGLD and RSMprop as ϵ=5 10 4, while for SGLD and SGD as ϵ= 5 10 1'. For CNN: 'Both convolutional layers use 5 5 filter size with 32 and 64 channels, respectively; 2 2 max pooling is used after each convolutional layer. The fully-connected layers have 200-200 hidden nodes with Re LU, 20 epochs are used, and L is set to 10.' |