Interacting Contour Stochastic Gradient Langevin Dynamics

Authors: Wei Deng, Siqi Liang, Botao Hao, Guang Lin, Faming Liang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we compare the proposed algorithm with popular benchmark methods for posterior sampling. The numerical results show a great potential of ICSGLD for large-scale uncertainty estimation tasks.
Researcher Affiliation Collaboration Wei Deng1, 2, Siqi Liang1, Botao Hao3, Guang Lin1, Faming Liang1 1Purdue University 2Morgan Stanley 3Deep Mind
Pseudocode Yes Algorithm 1 Interacting contour stochastic gradient Langevin dynamics algorithm (ICSGLD).
Open Source Code Yes Code is available at github.com/Wayne DW/Interacting-Contour-Stochastic-Gradient-Langevin-Dynamics.
Open Datasets Yes Our proposed algorithm achieves appealing mode explorations using a fixed learning rate on the MNIST dataset... based on the UCI Mushroom data set... on CIFAR100, and report the test accuracy (ACC) and test negative log-likelihood (NLL) based on 5 trials with standard error. For the out-of-distribution prediction performance, we test the well-trained models in Brier scores (Brier) * on the Street View House Numbers dataset (SVHN).
Dataset Splits No The paper mentions training data and test data but does not explicitly provide details about specific training/validation/test dataset splits (e.g., percentages or exact counts for a validation set) within the main text or supplementary material sections provided.
Hardware Specification No The paper mentions distributed computing but does not provide specific hardware details such as GPU models, CPU models, or cloud instance types used for experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments (e.g., 'Python 3.8' or 'PyTorch 1.9').
Experiment Setup Yes The learning rate is fixed to 1e-6 and the temperature is set to 0.1. ...batch size of 2500... fix ζ = 3e4 and weight decay 25. ...choose 100,000 partitions and u = 10. The step size follows ωk = min{0.01, 1 k0.6+100}. ...initial learning rate is 2e-6... choose m = 200 and u = 200 for Res Net20, 32, and 56 and u = 60 for WRN-16-8.