Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction
Authors: Wei Deng, Qi Feng, Georgios P. Karagiannis, Guang Lin, Faming Liang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerically, we conduct extensive experiments and obtain state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data. |
| Researcher Affiliation | Academia | Wei Deng Department of Mathematics Purdue University West Lafayette, IN, USA weideng056@gmail.com Qi Feng Department of Mathematics University of Southern California Los Angeles, CA, USA qif@usc.edu Georgios Karagiannis Department of Mathematical Sciences Durham University Durham, UK georgios.karagiannis@durham.ac.uk Guang Lin Departments of Mathematics & School of Mechanical Engineering Purdue University West Lafayette, IN, USA guanglin@purdue.edu Faming Liang Departments of Statistics Purdue University West Lafayette, IN, USA fmliang@purdue.edu |
| Pseudocode | Yes | Algorithm 1 Variance-reduced replica exchange stochastic gradient Langevin dynamics (VRre SGLD). |
| Open Source Code | Yes | For the detailed implementations, we release the code at https://github.com/Wayne DW/ Variance_Reduced_Replica_Exchange_Stochastic_Gradient_MCMC. |
| Open Datasets | Yes | We further test the proposed algorithm on CIFAR10 and CIFAR100. We collect the Res Net20 models trained on CIFAR10 and quantify the entropy on the Street View House Numbers (SVHN) dataset |
| Dataset Splits | No | The paper mentions using 'a training dataset of size N = 10^5' for Gaussian mixture distributions and testing on CIFAR10 and CIFAR100, but does not provide specific train/validation/test split percentages or sample counts for any dataset used in the experiments. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as GPU or CPU models, memory, or specific cloud computing instances used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required for replication. |
| Experiment Setup | Yes | In Figs 2(a) and 2(b), we present trace plots and kernel density estimates (KDE) of samples generated from VR-re SGLD with m = 40, τ (1) = 10 , τ (2) = 1000, η = 1e 7, and F = 1; re SGLD adopt the same hyper-parameters except for F = 100; SGLD uses η = 1e 7 and τ = 10. We run M-SGD, SGHMC and (VR-)re SGHMC for 500 epochs. For these algorithms, we follow a setup from Deng et al. (2020). We fix the learning rate η(1) k = 2e-6 in the first 200 epochs and decay it by 0.984 afterwards. For SGHMC and the low-temperature processes of (VR-)re SGHMC, we anneal the temperature following τ (1) k = 0.01/1.02k in the beginning and keep it fixed after the burn-in steps; regarding the high-temperature process, we set η(2) k = 1.5η(1) k and τ (2) k = 5τ (1) k . The initial correction factor F0 is fixed at 1.5e5. The thinning factor T is set to 256. |