reproducibilityindex.ai

Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

Authors: Wei Deng, Qi Feng, Georgios P. Karagiannis, Guang Lin, Faming Liang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerically, we conduct extensive experiments and obtain state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data.
Researcher Affiliation	Academia	Wei Deng Department of Mathematics Purdue University West Lafayette, IN, USA weideng056@gmail.com Qi Feng Department of Mathematics University of Southern California Los Angeles, CA, USA qif@usc.edu Georgios Karagiannis Department of Mathematical Sciences Durham University Durham, UK georgios.karagiannis@durham.ac.uk Guang Lin Departments of Mathematics & School of Mechanical Engineering Purdue University West Lafayette, IN, USA guanglin@purdue.edu Faming Liang Departments of Statistics Purdue University West Lafayette, IN, USA fmliang@purdue.edu
Pseudocode	Yes	Algorithm 1 Variance-reduced replica exchange stochastic gradient Langevin dynamics (VRre SGLD).
Open Source Code	Yes	For the detailed implementations, we release the code at https://github.com/Wayne DW/ Variance_Reduced_Replica_Exchange_Stochastic_Gradient_MCMC.
Open Datasets	Yes	We further test the proposed algorithm on CIFAR10 and CIFAR100. We collect the Res Net20 models trained on CIFAR10 and quantify the entropy on the Street View House Numbers (SVHN) dataset
Dataset Splits	No	The paper mentions using 'a training dataset of size N = 10^5' for Gaussian mixture distributions and testing on CIFAR10 and CIFAR100, but does not provide specific train/validation/test split percentages or sample counts for any dataset used in the experiments.
Hardware Specification	No	The paper does not specify any particular hardware components such as GPU or CPU models, memory, or specific cloud computing instances used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required for replication.
Experiment Setup	Yes	In Figs 2(a) and 2(b), we present trace plots and kernel density estimates (KDE) of samples generated from VR-re SGLD with m = 40, τ (1) = 10 , τ (2) = 1000, η = 1e 7, and F = 1; re SGLD adopt the same hyper-parameters except for F = 100; SGLD uses η = 1e 7 and τ = 10. We run M-SGD, SGHMC and (VR-)re SGHMC for 500 epochs. For these algorithms, we follow a setup from Deng et al. (2020). We ﬁx the learning rate η(1) k = 2e-6 in the ﬁrst 200 epochs and decay it by 0.984 afterwards. For SGHMC and the low-temperature processes of (VR-)re SGHMC, we anneal the temperature following τ (1) k = 0.01/1.02k in the beginning and keep it ﬁxed after the burn-in steps; regarding the high-temperature process, we set η(2) k = 1.5η(1) k and τ (2) k = 5τ (1) k . The initial correction factor F0 is ﬁxed at 1.5e5. The thinning factor T is set to 256.