Structured Stochastic Gradient MCMC

Authors: Antonios Alexos, Alex J Boyd, Stephan Mandt

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our scheme for Res Net-20 on CIFAR-10, SVHN, and FMNIST. In all cases, we find improvements in convergence speed and/or final accuracy compared to SGMCMC and parametric VI. We show in both small and large scale experiments that our method well approximates posterior marginals and gives improved results over SGMCMC and parametric VI on Resnet-20 architectures on CIFAR-10, Fashion MNIST, and SVHN in terms of runtime and/or final accuracy.
Researcher Affiliation Academia 1Department of Computer Science, University of California, Irvine, USA 2Department of Statistics, University of California, Irvine, USA.
Pseudocode Yes Algorithm 1 S-SGMCMC; Algorithm 2 Sd-SGMCMC
Open Source Code Yes Additionally, all code and implementations have been made publicly available.1 https://github.com/ajboyd2/pytorch_lvi
Open Datasets Yes We test our scheme for Res Net-20 on CIFAR-10, SVHN, and FMNIST. We compare p SGLD, S-p SGLD, and Sd-p SGLD with a Bernoulli(ρ) masking distribution with dropout rates ρ {0.1, 0.3, 0.5} on a fully-connected neural network with 2 hidden layers, with 50 hidden units each, trained and evaluated with MNIST using the standard train and test split. We used 7 different datasets: the wine quality dataset (Cortez et al., 2009), the Boston housing dataset (Harrison Jr & Rubinfeld, 1978), the obesity levels dataset (Palechor & de la Hoz Manotas, 2019), the Seoul bike-sharing dataset (E et al., 2020; E & Cho, 2020), the concrete compressive strength dataset (Yeh, 1998), and the airfoil self-noise dataset (Brooks et al., 1989).
Dataset Splits Yes Every dataset was split into 75% training data, 10% validation data, and 15% test data.
Hardware Specification No The paper does not explicitly describe the hardware used to run the experiments, nor does it provide specific models of GPUs, CPUs, or other computing resources.
Software Dependencies No The paper mentions software like "Py Torch" and "TensorFlow probability library" and "python" but does not specify their exact version numbers, which is required for reproducibility.
Experiment Setup Yes For all the experiments, we used a seed of 2. Regarding MNIST, we ran all the experiments for 500 epochs with a batch size of 500 and a learning rate of 1e-2. For Sd-p SGLD, the K is set to 300, which is the forward passes that the model does within 1 epoch. For Sd-p SGLD, p SGLD, Sd-SGHMC and SGHMC we tested their performances with learning rates of 1e-2, 1e-3, 1e-4, and 1e-5. In Figure 5, Sd-p SGLD has ρ = 0.5 and learning rate equal to 1e-3, p SGLD has learning rate equal to 1e-4, Sd-SGHMC has ρ = 0.5 and learning rate equal to 1e-2 and SGHMC has learning rate equal to 1e-2.