reproducibilityindex.ai

Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning

Authors: Ruqi Zhang, Chunyuan Li, Jianyi Zhang, Changyou Chen, Andrew Gordon Wilson

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide extensive experimental results to demonstrate the advantages of c SG-MCMC in sampling from multimodal distributions, including Bayesian neural networks and uncertainty estimation on several large and challenging datasets such as Image Net.
Researcher Affiliation	Collaboration	Ruqi Zhang Cornell University rz297@cornell.edu Chunyuan Li Microsoft Research, Redmond chunyl@microsoft.com Jianyi Zhang Duke University jz318@duke.edu Changyou Chen University at Buffalo, SUNY changyou@buffalo.edu Andrew Gordon Wilson New York University andrewgw@cims.nyu.edu
Pseudocode	Yes	Algorithm 1 Cyclical SG-MCMC.
Open Source Code	Yes	We release code at https://github.com/ruqizhang/csgmcmc.
Open Datasets	Yes	We demonstrate the effectiveness of c SG-MCMC on Bayesian neural networks for classiﬁcation on CIFAR-10 and CIFAR-100. We consider Bayesian logistic regression (BLR) on three real-world datasets from the UCI repository: Australian (15 covariates, 690 data points), German (25 covariates, 1000 data points) and Heart (14 covariates, 270 data points). We further study different learning algorithms on a large-scale dataset, Image Net. We train a three-layer MLP model on the standard MNIST train dataset until convergence using different algorithms, and estimate the entropy of the predictive distribution on the not MNIST dataset (Bulatov, 2011).
Dataset Splits	No	The paper mentions training and testing on datasets but does not specify a validation dataset split or provide exact percentages/counts for such splits.
Hardware Specification	No	The paper does not explicitly describe the hardware used for running experiments, such as specific CPU or GPU models, or memory specifications.
Software Dependencies	No	The paper mentions methods and algorithms (e.g., SGLD, SGHMC, Snapshot) but does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA x.x).
Experiment Setup	Yes	We set M = 4 and α0 = 0.5 for c SGLD, c SGHMC and Snapshot. The proportion hyper-parameter β =0.8 and 0.94 for CIFAR-10 and CIFAR-100, respectively. We collect 3 samples per cycle. We use a Res Net-18 (He et al., 2016) and run all algorithms for 200 epochs. For the traditional SG-MCMC methods, we thus avoid noise injection for the ﬁrst 150 epochs of training (corresponding to the zero temperature limit of SGLD and SGHMC), and resume SGMCMC as usual (with noise) for the last 50 epochs. We collect 20 samples for the MCMC methods and average their predictions in testing. For both c SGLD and c SGHMC, M = 100, β = 0.01. For c SGLD, α0N = 1.2, 0.5, 1.5 for Austrilian, German and Hear respectively. For c SGHMC α0N = 0.5, 0.3, 1.0 for Austrilian, German and Hear respectively. For SG-MCMC, the stepsize is a for the ﬁrst 5000 iterations and then switch to the decay schedule (2) with b = 0, γ = 0.55. a N = 1.2, 0.5, 1.5 for Austrilian, German and Hear respectively for SGLD and a N = 0.5, 0.3, 1.0 for Austrilian, German and Hear respectively for SGHMC. η = 0.5 in c SGHMC and SGHMC. For SG-MCMC, the stepsize decays from 0.1 to 0.001 for the ﬁrst 150 epochs and then switch to the decay schedule (2) with a = 0.01, b = 0 and γ = 0.5005. η = 0.9 in c SGHMC, Snapshot-SGDM and SGHMC. For both c SG-MCMC and Snapshot, M = 4. β = 0.8 in c SG-MCMC. α0N = 0.01 and 0.008 for c SGLD and c SGHMC respectively. For SG-MCMC, the stepsize is a for the ﬁrst 50 iterations and then switch to the decay schedule (2) with b = 0, γ = 0.5005. a N = 0.01 for SGLD and a N = 0.008 for SGHMC. η = 0.5 in c SGHMC, Snapshot-SGDM and SGHMC.