Communication-Efficient Stochastic Gradient MCMC for Neural Networks

Authors: Chunyuan Li, Changyou Chen, Yunchen Pu, Ricardo Henao, Lawrence Carin4173-4180

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on various neural networks demonstrate that the proposed algorithms can greatly reduce training time while achieving comparable (or better) test accuracy/log-likelihood levels, relative to traditional SG-MCMC.
Researcher Affiliation Collaboration 1Microsoft Research, Redmond 2University at Buffalo, SUNY 3Facebook 4Duke University
Pseudocode Yes Algorithm 1 Downpour SGLD. Algorithm 2 Elastic SGHMC.
Open Source Code No The paper does not contain an explicit statement offering open-source code for the described methodology or a link to a code repository.
Open Datasets Yes We first study FNN on the standard MNIST dataset... The CNN is tested on SVHN... The training/testing sets contain 26000/3200 characters, and the vocabulary size is 87. We consider a 1 or 2-hidden-layer RNN... Each algorithm runs 5 times on the Cartpole-v1 environment...
Dataset Splits No The paper specifies '60000 training and 10000 test samples' for MNIST but does not explicitly mention a validation set split or cross-validation methodology.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No The paper describes the algorithms and their implementation but does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes We use rectified linear units (Re LUs) ... and a two-layer model, 784-X-X-10, is employed, where X is the number of hidden units for each layer. Sizes (X-X) 400-400, 800-800 and 1200-1200 are considered. We first employ 1, 4 or 10 workers, and vary the communication period as π = {1, 5, 10, 20}. A standard 2 layers CNN is used. We consider a 1 or 2-hidden-layer RNN of dimension 128, with Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU). For fair comparison, RMSprop optimizer is considered as a competitor, entropy regularisation is off, and P = 5 workers are used for both methods. Each algorithm runs 5 times on the Cartpole-v1 environment.