Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Authors: Mohammad Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, Akash Srivastava
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results confirm this and further suggest that the weight-perturbation in our algorithm could be useful for exploration in reinforcement learning and stochastic optimization. |
| Researcher Affiliation | Academia | 1RIKEN Center for Advanced Intelligence project, Tokyo, Japan 2University of British Columbia, Vancouver, Canada 3University of Oxford, Oxford, UK 4University of Edinburgh, Edinburgh, UK. |
| Pseudocode | Yes | Figure 1. Comparison of Adam (left) and one of our proposed method Vadam (right). Adam performs maximum-likelihood estimation while Vadam performs variational inference, yet the two pseudocodes differ only slightly (differences highlighted in red). |
| Open Source Code | Yes | The code to reproduce our results is available at https://github.com/emtiyaz/vadam. |
| Open Datasets | Yes | We use three datasets: a toy dataset (N = 60, D = 2), USPS-3vs5 (N = 1781, D = 256) and Breast-Cancer (N = 683, D = 10). Details are in Appendix I. We show results on the standard UCI benchmark. We repeat the experimental setup used in Gal & Ghahramani (2016). |
| Dataset Splits | No | We use the 20 splits of the data provided by Gal & Ghahramani (2016) for training and testing. The paper mentions training and testing splits but does not explicitly detail a validation split or its methodology. |
| Hardware Specification | No | Finally, we are thankful for the RAIDEN computing system at the RIKEN Center for AI Project, which we extensively used for our experiments. While a computing system is mentioned, no specific hardware components such as GPU/CPU models or memory details are provided. |
| Software Dependencies | No | The paper mentions various methods and tools like Adam optimizer, RMSprop, Ada Grad, and OpenAI Gym, but it does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Following their work, we use a neural network with one hidden layer, 50 hidden units, and Re LU activation functions. We use the 20 splits of the data provided by Gal & Ghahramani (2016) for training and testing. We use Bayesian optimization to select the prior precision λ and noise precision of the Gaussian likelihood. We consider the deep deterministic policy gradient (DDPG) method for the Half-Cheetah task using a two-layer neural networks with 400 and 300 Re LU hidden units (Lillicrap et al., 2015). |