Non-convex Learning via Replica Exchange Stochastic Gradient MCMC
Authors: Wei Deng, Qi Feng, Liyao Gao, Faming Liang, Guang Lin
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we test the algorithm through extensive experiments on various setups and obtain the state-of-the-art results on CIFAR10, CIFAR100, and SVHN in both supervised learning and semi-supervised learning tasks. |
| Researcher Affiliation | Academia | 1Purdue University, West Lafayette, IN, USA. 2University of Southern California, Los Angeles, CA, USA. |
| Pseudocode | Yes | Algorithm 1 Adaptive Replica Exchange Stochastic Gradient Langevin Dynamics Algorithm. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | Empirically, we test the algorithm through extensive experiments on various setups and obtain the state-of-the-art results on CIFAR10, CIFAR100, and SVHN in both supervised learning and semi-supervised learning tasks. CIFAR10 and CIFAR100, which consist of 50,000 32 32 RGB images for training and 10,000 images for testing. SVHN consists of 73,257 10-class images for training and 26,032 images for testing. |
| Dataset Splits | Yes | CIFAR10 and CIFAR100, which consist of 50,000 32 32 RGB images for training and 10,000 images for testing. SVHN consists of 73,257 10-class images for training and 26,032 images for testing. |
| Hardware Specification | No | The paper mentions a 'GPU grant program from NVIDIA' in the acknowledgements, but it does not specify any particular GPU models, CPU types, or detailed hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'stochastic gradient Hamiltonian Monte Carlo (SGHMC) as the baseline sampling algorithm' and 'momentum stochastic gradient descent algorithm as M-SGD', but it does not specify software names with version numbers for implementation details (e.g., specific Python libraries or frameworks with their versions). |
| Experiment Setup | Yes | We choose batch-size 256 and run the experiments within 500 epochs. We first tune the optimal hyperparameters for M-SGD, SGHMC and the lowtemperature chain of re SGHMC: we set the learning rate η(1) k to 2e-6 in the first 200 epochs and decay it afterward by a factor of 0.984 every epoch; the low temperature follows an annealing schedule τ1 = 0.01 1.02k to accelerate the optimization; the weight decay is set to 25. Then, as to the high-temperature chain of re SGHMC, we use a larger learning rate η(2) k = 1.5η(1) k and a higher temperature τ2 = 5τ1. Following the dynamic temperatures, we fix Fk = F0αNk1.02k, where Nk denotes the number of swaps in the first k epochs and α = 0.8. The variance estimator is updated each epoch based on the variance of 10 samples of the stochastic energies and the smoothing factor is set to γ = 0.3 in (14). |