Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

Authors: Wei Deng, Qi Feng, Georgios P. Karagiannis, Guang Lin, Faming Liang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerically, we conduct extensive experiments and obtain state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data.
Researcher Affiliation Academia Wei Deng Department of Mathematics Purdue University West Lafayette, IN, USA weideng056@gmail.com Qi Feng Department of Mathematics University of Southern California Los Angeles, CA, USA qif@usc.edu Georgios Karagiannis Department of Mathematical Sciences Durham University Durham, UK georgios.karagiannis@durham.ac.uk Guang Lin Departments of Mathematics & School of Mechanical Engineering Purdue University West Lafayette, IN, USA guanglin@purdue.edu Faming Liang Departments of Statistics Purdue University West Lafayette, IN, USA fmliang@purdue.edu
Pseudocode Yes Algorithm 1 Variance-reduced replica exchange stochastic gradient Langevin dynamics (VRre SGLD).
Open Source Code Yes For the detailed implementations, we release the code at https://github.com/Wayne DW/ Variance_Reduced_Replica_Exchange_Stochastic_Gradient_MCMC.
Open Datasets Yes We further test the proposed algorithm on CIFAR10 and CIFAR100. We collect the Res Net20 models trained on CIFAR10 and quantify the entropy on the Street View House Numbers (SVHN) dataset
Dataset Splits No The paper mentions using 'a training dataset of size N = 10^5' for Gaussian mixture distributions and testing on CIFAR10 and CIFAR100, but does not provide specific train/validation/test split percentages or sample counts for any dataset used in the experiments.
Hardware Specification No The paper does not specify any particular hardware components such as GPU or CPU models, memory, or specific cloud computing instances used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required for replication.
Experiment Setup Yes In Figs 2(a) and 2(b), we present trace plots and kernel density estimates (KDE) of samples generated from VR-re SGLD with m = 40, τ (1) = 10 , τ (2) = 1000, η = 1e 7, and F = 1; re SGLD adopt the same hyper-parameters except for F = 100; SGLD uses η = 1e 7 and τ = 10. We run M-SGD, SGHMC and (VR-)re SGHMC for 500 epochs. For these algorithms, we follow a setup from Deng et al. (2020). We fix the learning rate η(1) k = 2e-6 in the first 200 epochs and decay it by 0.984 afterwards. For SGHMC and the low-temperature processes of (VR-)re SGHMC, we anneal the temperature following τ (1) k = 0.01/1.02k in the beginning and keep it fixed after the burn-in steps; regarding the high-temperature process, we set η(2) k = 1.5η(1) k and τ (2) k = 5τ (1) k . The initial correction factor F0 is fixed at 1.5e5. The thinning factor T is set to 256.