reproducibilityindex.ai

A Simple Baseline for Bayesian Uncertainty in Deep Learning

Authors: Wesley J. Maddox, Pavel Izmailov, Timur Garipov, Dmitry P. Vetrov, Andrew Gordon Wilson

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including MC dropout, KFAC Laplace, SGLD, and temperature scaling.
Researcher Affiliation	Collaboration	1 New York University 2 Samsung AI Center Moscow 3 Samsung-HSE Laboratory, National Research University Higher School of Economics
Pseudocode	Yes	Algorithm 1 Bayesian Model Averaging with SWAG
Open Source Code	Yes	We release Py Torch code at https://github.com/wjmaddox/swa_gaussian.
Open Datasets	Yes	We conduct a thorough empirical evaluation of SWAG... on CIFAR-10, CIFAR-100 and Image Net ILSVRC-2012 [45]. We next apply SWAG to an LSTM network on language modeling tasks on Penn Treebank and Wiki Text-2 datasets.
Dataset Splits	Yes	We report test and validation perplexities for different methods and datasets in Table 1.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or memory specifications) used for running its experiments.
Software Dependencies	No	The paper states 'For all the methods we use our implementations in Py Torch (see Appendix 8),' but it does not specify any version numbers for PyTorch or other software dependencies.
Experiment Setup	Yes	We train all networks for 300 epochs, starting to collect models for SWA and SWAG approximations once per epoch after epoch 160. For SWAG, K-FAC Laplace, and Dropout we use 30 samples at test time. Appendix 8.1 states: 'For all methods, we train models using SGD with momentum for 300 epochs, batch size of 128 and weight decay of 5e-4. We use a learning rate of 0.01 for the last 140 epochs of training.'