Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction
Authors: Wei Deng, Qi Feng, Georgios P. Karagiannis, Guang Lin, Faming Liang
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerically, we conduct extensive experiments and obtain state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data. |
| Researcher Affiliation | Academia | Wei Deng Department of Mathematics Purdue University West Lafayette, IN, USA EMAIL Qi Feng Department of Mathematics University of Southern California Los Angeles, CA, USA EMAIL Georgios Karagiannis Department of Mathematical Sciences Durham University Durham, UK EMAIL Guang Lin Departments of Mathematics & School of Mechanical Engineering Purdue University West Lafayette, IN, USA EMAIL Faming Liang Departments of Statistics Purdue University West Lafayette, IN, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Variance-reduced replica exchange stochastic gradient Langevin dynamics (VRre SGLD). |
| Open Source Code | Yes | For the detailed implementations, we release the code at https://github.com/Wayne DW/ Variance_Reduced_Replica_Exchange_Stochastic_Gradient_MCMC. |
| Open Datasets | Yes | We further test the proposed algorithm on CIFAR10 and CIFAR100. We collect the Res Net20 models trained on CIFAR10 and quantify the entropy on the Street View House Numbers (SVHN) dataset |
| Dataset Splits | No | The paper mentions using 'a training dataset of size N = 10^5' for Gaussian mixture distributions and testing on CIFAR10 and CIFAR100, but does not provide specific train/validation/test split percentages or sample counts for any dataset used in the experiments. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as GPU or CPU models, memory, or specific cloud computing instances used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required for replication. |
| Experiment Setup | Yes | In Figs 2(a) and 2(b), we present trace plots and kernel density estimates (KDE) of samples generated from VR-re SGLD with m = 40, τ (1) = 10 , τ (2) = 1000, η = 1e 7, and F = 1; re SGLD adopt the same hyper-parameters except for F = 100; SGLD uses η = 1e 7 and τ = 10. We run M-SGD, SGHMC and (VR-)re SGHMC for 500 epochs. For these algorithms, we follow a setup from Deng et al. (2020). We fix the learning rate η(1) k = 2e-6 in the first 200 epochs and decay it by 0.984 afterwards. For SGHMC and the low-temperature processes of (VR-)re SGHMC, we anneal the temperature following τ (1) k = 0.01/1.02k in the beginning and keep it fixed after the burn-in steps; regarding the high-temperature process, we set η(2) k = 1.5η(1) k and τ (2) k = 5τ (1) k . The initial correction factor F0 is fixed at 1.5e5. The thinning factor T is set to 256. |