Communication-Efficient Stochastic Gradient MCMC for Neural Networks
Authors: Chunyuan Li, Changyou Chen, Yunchen Pu, Ricardo Henao, Lawrence Carin4173-4180
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on various neural networks demonstrate that the proposed algorithms can greatly reduce training time while achieving comparable (or better) test accuracy/log-likelihood levels, relative to traditional SG-MCMC. |
| Researcher Affiliation | Collaboration | 1Microsoft Research, Redmond 2University at Buffalo, SUNY 3Facebook 4Duke University |
| Pseudocode | Yes | Algorithm 1 Downpour SGLD. Algorithm 2 Elastic SGHMC. |
| Open Source Code | No | The paper does not contain an explicit statement offering open-source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We first study FNN on the standard MNIST dataset... The CNN is tested on SVHN... The training/testing sets contain 26000/3200 characters, and the vocabulary size is 87. We consider a 1 or 2-hidden-layer RNN... Each algorithm runs 5 times on the Cartpole-v1 environment... |
| Dataset Splits | No | The paper specifies '60000 training and 10000 test samples' for MNIST but does not explicitly mention a validation set split or cross-validation methodology. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper describes the algorithms and their implementation but does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | We use rectified linear units (Re LUs) ... and a two-layer model, 784-X-X-10, is employed, where X is the number of hidden units for each layer. Sizes (X-X) 400-400, 800-800 and 1200-1200 are considered. We first employ 1, 4 or 10 workers, and vary the communication period as π = {1, 5, 10, 20}. A standard 2 layers CNN is used. We consider a 1 or 2-hidden-layer RNN of dimension 128, with Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU). For fair comparison, RMSprop optimizer is considered as a competitor, entropy regularisation is off, and P = 5 workers are used for both methods. Each algorithm runs 5 times on the Cartpole-v1 environment. |