reproducibilityindex.ai

Communication-Efficient Stochastic Gradient MCMC for Neural Networks

Authors: Chunyuan Li, Changyou Chen, Yunchen Pu, Ricardo Henao, Lawrence Carin4173-4180

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on various neural networks demonstrate that the proposed algorithms can greatly reduce training time while achieving comparable (or better) test accuracy/log-likelihood levels, relative to traditional SG-MCMC.
Researcher Affiliation	Collaboration	1Microsoft Research, Redmond 2University at Buffalo, SUNY 3Facebook 4Duke University
Pseudocode	Yes	Algorithm 1 Downpour SGLD. Algorithm 2 Elastic SGHMC.
Open Source Code	No	The paper does not contain an explicit statement offering open-source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We ﬁrst study FNN on the standard MNIST dataset... The CNN is tested on SVHN... The training/testing sets contain 26000/3200 characters, and the vocabulary size is 87. We consider a 1 or 2-hidden-layer RNN... Each algorithm runs 5 times on the Cartpole-v1 environment...
Dataset Splits	No	The paper specifies '60000 training and 10000 test samples' for MNIST but does not explicitly mention a validation set split or cross-validation methodology.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies	No	The paper describes the algorithms and their implementation but does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	We use rectiﬁed linear units (Re LUs) ... and a two-layer model, 784-X-X-10, is employed, where X is the number of hidden units for each layer. Sizes (X-X) 400-400, 800-800 and 1200-1200 are considered. We ﬁrst employ 1, 4 or 10 workers, and vary the communication period as π = {1, 5, 10, 20}. A standard 2 layers CNN is used. We consider a 1 or 2-hidden-layer RNN of dimension 128, with Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU). For fair comparison, RMSprop optimizer is considered as a competitor, entropy regularisation is off, and P = 5 workers are used for both methods. Each algorithm runs 5 times on the Cartpole-v1 environment.