reproducibilityindex.ai

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

Authors: Yarin Gal, Zoubin Ghahramani

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We next perform an extensive assessment of the properties of the uncertainty estimates obtained from dropout NNs and convnets on the tasks of regression and classiﬁcation. We compare the uncertainty obtained from different model architectures and non-linearities, both on tasks of extrapolation, and show that model uncertainty is important for classiﬁcation tasks using MNIST (Le Cun & Cortes, 1998) as an example. We then show that using dropout s uncertainty we can obtain a considerable improvement in predictive log-likelihood and RMSE compared to existing state-of-the-art methods.
Researcher Affiliation	Academia	Yarin Gal YG279@CAM.AC.UK Zoubin Ghahramani ZG201@CAM.AC.UK University of Cambridge
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and demos are available at http://yarin.co.
Open Datasets	Yes	We use a subset of the atmospheric CO2 concentrations dataset derived from in situ air samples collected at Mauna Loa Observatory, Hawaii (Keeling et al., 2004)...
Dataset Splits	Yes	We use Bayesian optimisation (BO, (Snoek et al., 2012; Snoek & authors, 2015)) over validation log-likelihood to ﬁnd optimal τ... All experiments were averaged on 20 random splits of the data (apart from Protein for which only 5 splits were used and Year for which one split was used).
Hardware Specification	No	The paper does not provide specific details on the hardware used for running experiments, such as GPU or CPU models, or memory specifications. It only implicitly refers to computation on CPUs and GPUs through software mentions like Theano and Caffe.
Software Dependencies	No	The paper mentions software like Keras (Chollet, 2015), Theano (Bergstra et al., 2010), Caffe (Jia et al., 2014), and the Adam optimiser (Kingma & Ba, 2014), but it does not specify concrete version numbers for these software dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	We use NNs with either 4 or 5 hidden layers and 1024 hidden units. We use either Re LU non-linearities or Tan H non-linearities in each network, and use dropout probabilities of either 0.1 or 0.2. Exact experiment set-up is given in section E.1 in the appendix. (Section 5.1). We used dropout probability of 0.5. We trained the model for 10^6 iterations with the same learning rate policy as before with γ = 0.0001 and p = 0.75. (Section 5.2)