Practical Deep Learning with Bayesian Principles
Authors: Kazuki Osawa, Siddharth Swaroop, Mohammad Emtiyaz Khan, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate practical training of deep networks with natural-gradient variational inference. By applying techniques such as batch normalisation, data augmentation, and distributed training, we achieve similar performance in about the same number of epochs as the Adam optimiser, even on large datasets such as Image Net. Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated, uncertainties on outof-distribution data are improved, and continual-learning performance is boosted. and 4 Experiments In this section, we present experiments on fitting several deep networks on CIFAR-10 and Image Net. |
| Researcher Affiliation | Academia | 1 Tokyo Institute of Technology, Tokyo, Japan 2 University of Cambridge, Cambridge, UK 3 Indian Institute of Technology (ISM), Dhanbad, India 4 University of Osnabrück, Osnabrück, Germany 5 RIKEN Center for AI Project, Tokyo, Japan |
| Pseudocode | Yes | Algorithm 1: Variational Online Gauss Newton (VOGN) and Figure 2: A pseudo-code for our distributed VOGN algorithm is shown in Algorithm 1 |
| Open Source Code | Yes | A Py Torch implementation1 is available as a plug-and-play optimiser. and 1 The code is available at https://github.com/team-approx-bayes/dl-with-bayes. |
| Open Datasets | Yes | CIFAR-10 [28] contains 10 classes with 50,000 images for training and 10,000 images for validation. For Image Net, we train with 1.28 million training examples and validate on 50,000 examples, classifying between 1,000 classes. |
| Dataset Splits | Yes | CIFAR-10 [28] contains 10 classes with 50,000 images for training and 10,000 images for validation. For Image Net, we train with 1.28 million training examples and validate on 50,000 examples, classifying between 1,000 classes. |
| Hardware Specification | Yes | We used a large minibatch size M = 4, 096 and parallelise them across 128 GPUs (NVIDIA Tesla P100). |
| Software Dependencies | No | The paper mentions 'Py Torch implementation' but does not provide specific version numbers for PyTorch or other software dependencies. |
| Experiment Setup | Yes | Batch normalisation: Batch Norm layers are inserted between neural network layers. They help stabilise each layer s input distribution by normalising the running average of the inputs mean and variance. In our VOGN implementation, we simply use the existing implementation with default hyperparameter settings. and We set by considering the specific DA techniques used. When training on CIFAR-10, the random cropping DA step involves first padding the 32x32 images to become of size 40x40, and then taking randomly selected 28x28 cropped images. We consider this as effectively increasing the dataset size by a factor of 5 (4 images for each corner, and one central image). The horizontal flipping DA step doubles the dataset size (one dataset of unflipped images, one for flipped images). Combined, this gives = 10. and The full set of hyperparameters is in Appendix D. |