reproducibilityindex.ai

Non-convex Learning via Replica Exchange Stochastic Gradient MCMC

Authors: Wei Deng, Qi Feng, Liyao Gao, Faming Liang, Guang Lin

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we test the algorithm through extensive experiments on various setups and obtain the state-of-the-art results on CIFAR10, CIFAR100, and SVHN in both supervised learning and semi-supervised learning tasks.
Researcher Affiliation	Academia	1Purdue University, West Lafayette, IN, USA. 2University of Southern California, Los Angeles, CA, USA.
Pseudocode	Yes	Algorithm 1 Adaptive Replica Exchange Stochastic Gradient Langevin Dynamics Algorithm.
Open Source Code	No	The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology.
Open Datasets	Yes	Empirically, we test the algorithm through extensive experiments on various setups and obtain the state-of-the-art results on CIFAR10, CIFAR100, and SVHN in both supervised learning and semi-supervised learning tasks. CIFAR10 and CIFAR100, which consist of 50,000 32 32 RGB images for training and 10,000 images for testing. SVHN consists of 73,257 10-class images for training and 26,032 images for testing.
Dataset Splits	Yes	CIFAR10 and CIFAR100, which consist of 50,000 32 32 RGB images for training and 10,000 images for testing. SVHN consists of 73,257 10-class images for training and 26,032 images for testing.
Hardware Specification	No	The paper mentions a 'GPU grant program from NVIDIA' in the acknowledgements, but it does not specify any particular GPU models, CPU types, or detailed hardware specifications used for running the experiments.
Software Dependencies	No	The paper mentions using 'stochastic gradient Hamiltonian Monte Carlo (SGHMC) as the baseline sampling algorithm' and 'momentum stochastic gradient descent algorithm as M-SGD', but it does not specify software names with version numbers for implementation details (e.g., specific Python libraries or frameworks with their versions).
Experiment Setup	Yes	We choose batch-size 256 and run the experiments within 500 epochs. We ﬁrst tune the optimal hyperparameters for M-SGD, SGHMC and the lowtemperature chain of re SGHMC: we set the learning rate η(1) k to 2e-6 in the ﬁrst 200 epochs and decay it afterward by a factor of 0.984 every epoch; the low temperature follows an annealing schedule τ1 = 0.01 1.02k to accelerate the optimization; the weight decay is set to 25. Then, as to the high-temperature chain of re SGHMC, we use a larger learning rate η(2) k = 1.5η(1) k and a higher temperature τ2 = 5τ1. Following the dynamic temperatures, we ﬁx Fk = F0αNk1.02k, where Nk denotes the number of swaps in the ﬁrst k epochs and α = 0.8. The variance estimator is updated each epoch based on the variance of 10 samples of the stochastic energies and the smoothing factor is set to γ = 0.3 in (14).