reproducibilityindex.ai

Practical Deep Learning with Bayesian Principles

Authors: Kazuki Osawa, Siddharth Swaroop, Mohammad Emtiyaz Khan, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate practical training of deep networks with natural-gradient variational inference. By applying techniques such as batch normalisation, data augmentation, and distributed training, we achieve similar performance in about the same number of epochs as the Adam optimiser, even on large datasets such as Image Net. Importantly, the beneﬁts of Bayesian principles are preserved: predictive probabilities are well-calibrated, uncertainties on outof-distribution data are improved, and continual-learning performance is boosted. and 4 Experiments In this section, we present experiments on ﬁtting several deep networks on CIFAR-10 and Image Net.
Researcher Affiliation	Academia	1 Tokyo Institute of Technology, Tokyo, Japan 2 University of Cambridge, Cambridge, UK 3 Indian Institute of Technology (ISM), Dhanbad, India 4 University of Osnabrück, Osnabrück, Germany 5 RIKEN Center for AI Project, Tokyo, Japan
Pseudocode	Yes	Algorithm 1: Variational Online Gauss Newton (VOGN) and Figure 2: A pseudo-code for our distributed VOGN algorithm is shown in Algorithm 1
Open Source Code	Yes	A Py Torch implementation1 is available as a plug-and-play optimiser. and 1 The code is available at https://github.com/team-approx-bayes/dl-with-bayes.
Open Datasets	Yes	CIFAR-10 [28] contains 10 classes with 50,000 images for training and 10,000 images for validation. For Image Net, we train with 1.28 million training examples and validate on 50,000 examples, classifying between 1,000 classes.
Dataset Splits	Yes	CIFAR-10 [28] contains 10 classes with 50,000 images for training and 10,000 images for validation. For Image Net, we train with 1.28 million training examples and validate on 50,000 examples, classifying between 1,000 classes.
Hardware Specification	Yes	We used a large minibatch size M = 4, 096 and parallelise them across 128 GPUs (NVIDIA Tesla P100).
Software Dependencies	No	The paper mentions 'Py Torch implementation' but does not provide specific version numbers for PyTorch or other software dependencies.
Experiment Setup	Yes	Batch normalisation: Batch Norm layers are inserted between neural network layers. They help stabilise each layer s input distribution by normalising the running average of the inputs mean and variance. In our VOGN implementation, we simply use the existing implementation with default hyperparameter settings. and We set by considering the speciﬁc DA techniques used. When training on CIFAR-10, the random cropping DA step involves ﬁrst padding the 32x32 images to become of size 40x40, and then taking randomly selected 28x28 cropped images. We consider this as effectively increasing the dataset size by a factor of 5 (4 images for each corner, and one central image). The horizontal ﬂipping DA step doubles the dataset size (one dataset of unﬂipped images, one for ﬂipped images). Combined, this gives = 10. and The full set of hyperparameters is in Appendix D.