Structured Stochastic Gradient MCMC
Authors: Antonios Alexos, Alex J Boyd, Stephan Mandt
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our scheme for Res Net-20 on CIFAR-10, SVHN, and FMNIST. In all cases, we find improvements in convergence speed and/or final accuracy compared to SGMCMC and parametric VI. We show in both small and large scale experiments that our method well approximates posterior marginals and gives improved results over SGMCMC and parametric VI on Resnet-20 architectures on CIFAR-10, Fashion MNIST, and SVHN in terms of runtime and/or final accuracy. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of California, Irvine, USA 2Department of Statistics, University of California, Irvine, USA. |
| Pseudocode | Yes | Algorithm 1 S-SGMCMC; Algorithm 2 Sd-SGMCMC |
| Open Source Code | Yes | Additionally, all code and implementations have been made publicly available.1 https://github.com/ajboyd2/pytorch_lvi |
| Open Datasets | Yes | We test our scheme for Res Net-20 on CIFAR-10, SVHN, and FMNIST. We compare p SGLD, S-p SGLD, and Sd-p SGLD with a Bernoulli(ρ) masking distribution with dropout rates ρ {0.1, 0.3, 0.5} on a fully-connected neural network with 2 hidden layers, with 50 hidden units each, trained and evaluated with MNIST using the standard train and test split. We used 7 different datasets: the wine quality dataset (Cortez et al., 2009), the Boston housing dataset (Harrison Jr & Rubinfeld, 1978), the obesity levels dataset (Palechor & de la Hoz Manotas, 2019), the Seoul bike-sharing dataset (E et al., 2020; E & Cho, 2020), the concrete compressive strength dataset (Yeh, 1998), and the airfoil self-noise dataset (Brooks et al., 1989). |
| Dataset Splits | Yes | Every dataset was split into 75% training data, 10% validation data, and 15% test data. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run the experiments, nor does it provide specific models of GPUs, CPUs, or other computing resources. |
| Software Dependencies | No | The paper mentions software like "Py Torch" and "TensorFlow probability library" and "python" but does not specify their exact version numbers, which is required for reproducibility. |
| Experiment Setup | Yes | For all the experiments, we used a seed of 2. Regarding MNIST, we ran all the experiments for 500 epochs with a batch size of 500 and a learning rate of 1e-2. For Sd-p SGLD, the K is set to 300, which is the forward passes that the model does within 1 epoch. For Sd-p SGLD, p SGLD, Sd-SGHMC and SGHMC we tested their performances with learning rates of 1e-2, 1e-3, 1e-4, and 1e-5. In Figure 5, Sd-p SGLD has ρ = 0.5 and learning rate equal to 1e-3, p SGLD has learning rate equal to 1e-4, Sd-SGHMC has ρ = 0.5 and learning rate equal to 1e-2 and SGHMC has learning rate equal to 1e-2. |