Non-convex Learning via Replica Exchange Stochastic Gradient MCMC

Authors: Wei Deng, Qi Feng, Liyao Gao, Faming Liang, Guang Lin

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we test the algorithm through extensive experiments on various setups and obtain the state-of-the-art results on CIFAR10, CIFAR100, and SVHN in both supervised learning and semi-supervised learning tasks.
Researcher Affiliation Academia 1Purdue University, West Lafayette, IN, USA. 2University of Southern California, Los Angeles, CA, USA.
Pseudocode Yes Algorithm 1 Adaptive Replica Exchange Stochastic Gradient Langevin Dynamics Algorithm.
Open Source Code No The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology.
Open Datasets Yes Empirically, we test the algorithm through extensive experiments on various setups and obtain the state-of-the-art results on CIFAR10, CIFAR100, and SVHN in both supervised learning and semi-supervised learning tasks. CIFAR10 and CIFAR100, which consist of 50,000 32 32 RGB images for training and 10,000 images for testing. SVHN consists of 73,257 10-class images for training and 26,032 images for testing.
Dataset Splits Yes CIFAR10 and CIFAR100, which consist of 50,000 32 32 RGB images for training and 10,000 images for testing. SVHN consists of 73,257 10-class images for training and 26,032 images for testing.
Hardware Specification No The paper mentions a 'GPU grant program from NVIDIA' in the acknowledgements, but it does not specify any particular GPU models, CPU types, or detailed hardware specifications used for running the experiments.
Software Dependencies No The paper mentions using 'stochastic gradient Hamiltonian Monte Carlo (SGHMC) as the baseline sampling algorithm' and 'momentum stochastic gradient descent algorithm as M-SGD', but it does not specify software names with version numbers for implementation details (e.g., specific Python libraries or frameworks with their versions).
Experiment Setup Yes We choose batch-size 256 and run the experiments within 500 epochs. We first tune the optimal hyperparameters for M-SGD, SGHMC and the lowtemperature chain of re SGHMC: we set the learning rate η(1) k to 2e-6 in the first 200 epochs and decay it afterward by a factor of 0.984 every epoch; the low temperature follows an annealing schedule τ1 = 0.01 1.02k to accelerate the optimization; the weight decay is set to 25. Then, as to the high-temperature chain of re SGHMC, we use a larger learning rate η(2) k = 1.5η(1) k and a higher temperature τ2 = 5τ1. Following the dynamic temperatures, we fix Fk = F0αNk1.02k, where Nk denotes the number of swaps in the first k epochs and α = 0.8. The variance estimator is updated each epoch based on the variance of 10 samples of the stochastic energies and the smoothing factor is set to γ = 0.3 in (14).