reproducibilityindex.ai

Stochastic Gradient MCMC with Stale Gradients

Authors: Changyou Chen, Nan Ding, Chunyuan Li, Yizhe Zhang, Lawrence Carin

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on synthetic data and deep neural networks validate our theory, demonstrating the effectiveness and scalability of SG-MCMC with stale gradients.
Researcher Affiliation	Collaboration	Dept. of Electrical and Computer Engineering, Duke University, Durham, NC, USA Google Inc., Venice, CA, USA {cc448,cl319,yz196,lcarin}@duke.edu; dingnan@google.com
Pseudocode	Yes	Algorithm 1 State update of SGHMC with the stale stochastic gradient θ ˆUτl(θ)
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing the code for their specific S2G-MCMC implementation or a link to a code repository. It mentions using 'an MPI (message passing interface) extension of the popular Caffe package for deep learning [32]' but no specific code release for their contributions.
Open Datasets	Yes	We use the Adult dataset , a9a, with 32,561 training samples and 16,281 test samples. [...] http://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/binary.html. [...] Le Net for MNIST We modify the standard Le Net to a Bayesian setting for the MNIST dataset. [...] Cifar10-Quick net for CIFAR10 [...] The CIFAR-10 dataset consists of 60,000 color images of size 32 32 in 10 classes, with 50,000 for training and 10,000 for testing.
Dataset Splits	No	The paper provides training and test split sizes for the Adult dataset ('32,561 training samples and 16,281 test samples') and CIFAR-10 ('50,000 for training and 10,000 for testing'), but does not explicitly mention a validation set split or methodology for it.
Hardware Specification	Yes	The algorithm is run on a cluster of ﬁve machines. Each machine is equipped with eight 3.60GHz Intel(R) Core(TM) i7-4790 CPU cores.
Software Dependencies	No	The paper mentions 'Caffe' and 'MPICH library' as software used, but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	In all these models, zero mean and unit variance Gaussian priors are employed for the weights to capture weight uncertainties, an effective way to deal with overﬁtting [33]. We vary the number of servers S among {1, 3, 5, 7}, and the number of workers for each server from 1 to 9. [...] For simplicity, we use the default parameter setting speciﬁed in Caffe, with the additional parameter B in SGHMC (Algorithm 1) set to (1 m), where m is the moment variable deﬁned in the SGD algorithm in Caffe.