Stochastic Gradient MCMC with Stale Gradients
Authors: Changyou Chen, Nan Ding, Chunyuan Li, Yizhe Zhang, Lawrence Carin
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on synthetic data and deep neural networks validate our theory, demonstrating the effectiveness and scalability of SG-MCMC with stale gradients. |
| Researcher Affiliation | Collaboration | Dept. of Electrical and Computer Engineering, Duke University, Durham, NC, USA Google Inc., Venice, CA, USA {cc448,cl319,yz196,lcarin}@duke.edu; dingnan@google.com |
| Pseudocode | Yes | Algorithm 1 State update of SGHMC with the stale stochastic gradient θ ˆUτl(θ) |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code for their specific S2G-MCMC implementation or a link to a code repository. It mentions using 'an MPI (message passing interface) extension of the popular Caffe package for deep learning [32]' but no specific code release for their contributions. |
| Open Datasets | Yes | We use the Adult dataset , a9a, with 32,561 training samples and 16,281 test samples. [...] http://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/binary.html. [...] Le Net for MNIST We modify the standard Le Net to a Bayesian setting for the MNIST dataset. [...] Cifar10-Quick net for CIFAR10 [...] The CIFAR-10 dataset consists of 60,000 color images of size 32 32 in 10 classes, with 50,000 for training and 10,000 for testing. |
| Dataset Splits | No | The paper provides training and test split sizes for the Adult dataset ('32,561 training samples and 16,281 test samples') and CIFAR-10 ('50,000 for training and 10,000 for testing'), but does not explicitly mention a validation set split or methodology for it. |
| Hardware Specification | Yes | The algorithm is run on a cluster of five machines. Each machine is equipped with eight 3.60GHz Intel(R) Core(TM) i7-4790 CPU cores. |
| Software Dependencies | No | The paper mentions 'Caffe' and 'MPICH library' as software used, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | In all these models, zero mean and unit variance Gaussian priors are employed for the weights to capture weight uncertainties, an effective way to deal with overfitting [33]. We vary the number of servers S among {1, 3, 5, 7}, and the number of workers for each server from 1 to 9. [...] For simplicity, we use the default parameter setting specified in Caffe, with the additional parameter B in SGHMC (Algorithm 1) set to (1 m), where m is the moment variable defined in the SGD algorithm in Caffe. |