Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Authors: Yarin Gal, Zoubin Ghahramani
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We next perform an extensive assessment of the properties of the uncertainty estimates obtained from dropout NNs and convnets on the tasks of regression and classification. We compare the uncertainty obtained from different model architectures and non-linearities, both on tasks of extrapolation, and show that model uncertainty is important for classification tasks using MNIST (Le Cun & Cortes, 1998) as an example. We then show that using dropout s uncertainty we can obtain a considerable improvement in predictive log-likelihood and RMSE compared to existing state-of-the-art methods. |
| Researcher Affiliation | Academia | Yarin Gal YG279@CAM.AC.UK Zoubin Ghahramani ZG201@CAM.AC.UK University of Cambridge |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and demos are available at http://yarin.co. |
| Open Datasets | Yes | We use a subset of the atmospheric CO2 concentrations dataset derived from in situ air samples collected at Mauna Loa Observatory, Hawaii (Keeling et al., 2004)... |
| Dataset Splits | Yes | We use Bayesian optimisation (BO, (Snoek et al., 2012; Snoek & authors, 2015)) over validation log-likelihood to find optimal τ... All experiments were averaged on 20 random splits of the data (apart from Protein for which only 5 splits were used and Year for which one split was used). |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running experiments, such as GPU or CPU models, or memory specifications. It only implicitly refers to computation on CPUs and GPUs through software mentions like Theano and Caffe. |
| Software Dependencies | No | The paper mentions software like Keras (Chollet, 2015), Theano (Bergstra et al., 2010), Caffe (Jia et al., 2014), and the Adam optimiser (Kingma & Ba, 2014), but it does not specify concrete version numbers for these software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | We use NNs with either 4 or 5 hidden layers and 1024 hidden units. We use either Re LU non-linearities or Tan H non-linearities in each network, and use dropout probabilities of either 0.1 or 0.2. Exact experiment set-up is given in section E.1 in the appendix. (Section 5.1). We used dropout probability of 0.5. We trained the model for 10^6 iterations with the same learning rate policy as before with γ = 0.0001 and p = 0.75. (Section 5.2) |