reproducibilityindex.ai

Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam

Authors: Mohammad Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, Akash Srivastava

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results conﬁrm this and further suggest that the weight-perturbation in our algorithm could be useful for exploration in reinforcement learning and stochastic optimization.
Researcher Affiliation	Academia	1RIKEN Center for Advanced Intelligence project, Tokyo, Japan 2University of British Columbia, Vancouver, Canada 3University of Oxford, Oxford, UK 4University of Edinburgh, Edinburgh, UK.
Pseudocode	Yes	Figure 1. Comparison of Adam (left) and one of our proposed method Vadam (right). Adam performs maximum-likelihood estimation while Vadam performs variational inference, yet the two pseudocodes differ only slightly (differences highlighted in red).
Open Source Code	Yes	The code to reproduce our results is available at https://github.com/emtiyaz/vadam.
Open Datasets	Yes	We use three datasets: a toy dataset (N = 60, D = 2), USPS-3vs5 (N = 1781, D = 256) and Breast-Cancer (N = 683, D = 10). Details are in Appendix I. We show results on the standard UCI benchmark. We repeat the experimental setup used in Gal & Ghahramani (2016).
Dataset Splits	No	We use the 20 splits of the data provided by Gal & Ghahramani (2016) for training and testing. The paper mentions training and testing splits but does not explicitly detail a validation split or its methodology.
Hardware Specification	No	Finally, we are thankful for the RAIDEN computing system at the RIKEN Center for AI Project, which we extensively used for our experiments. While a computing system is mentioned, no specific hardware components such as GPU/CPU models or memory details are provided.
Software Dependencies	No	The paper mentions various methods and tools like Adam optimizer, RMSprop, Ada Grad, and OpenAI Gym, but it does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Following their work, we use a neural network with one hidden layer, 50 hidden units, and Re LU activation functions. We use the 20 splits of the data provided by Gal & Ghahramani (2016) for training and testing. We use Bayesian optimization to select the prior precision λ and noise precision of the Gaussian likelihood. We consider the deep deterministic policy gradient (DDPG) method for the Half-Cheetah task using a two-layer neural networks with 400 and 300 Re LU hidden units (Lillicrap et al., 2015).