Concrete Dropout
Authors: Yarin Gal, Jiri Hron, Alex Kendall
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We next analyse the behaviour of our proposed dropout variant on a wide variety of tasks. We study how our dropout variant captures different types of uncertainty on a simple synthetic dataset with known ground truth uncertainty, and show how its behaviour changes with increasing amounts of data versus model size ( 4.1). We show that Concrete dropout matches the performance of handtuned dropout on the UCI datasets ( 4.2) and MNIST ( 4.3), and further demonstrate our variant on large models used in the Computer Vision community ( 4.4). We show a significant reduction in experiment time as well as improved model performance and uncertainty calibration. Lastly, we demonstrate our dropout variant in a model-based RL task extending on [10], showing that the agent correctly reduces its uncertainty dynamically as the amount of data increases ( 4.5). |
| Researcher Affiliation | Academia | Yarin Gal yarin.gal@eng.cam.ac.uk University of Cambridge and Alan Turing Institute, London Jiri Hron jh2084@cam.ac.uk University of Cambridge Alex Kendall agk34@cam.ac.uk University of Cambridge |
| Pseudocode | Yes | A Python code snippet for Concrete dropout in Keras [5] is given in appendix C |
| Open Source Code | Yes | A Python code snippet for Concrete dropout in Keras [5] is given in appendix C, spanning about 20 lines of code, and experiment code is given online2. 2https://github.com/yaringal/Concrete Dropout |
| Open Datasets | Yes | We next assess the performance of our technique in a regression setting using the popular UCI benchmark [26]. We further experimented with the standard classification benchmark MNIST [24]. |
| Dataset Splits | No | The paper mentions using a 'validation set split from the total data' in Section 4.5 and refers to 'maximise validation log-likelihood' in Section 2. It also notes 'Full results as well as experiment setup are given in the appendix D' for UCI datasets, implying details are deferred. However, specific percentages or sample counts for training/validation/test splits are not provided in the main text. |
| Hardware Specification | No | The paper mentions the use of 'multiple GPUs' in the context of large models, but it does not specify any particular GPU models, CPU types, or other hardware specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Keras [5]' and a 'Python code snippet' but does not provide specific version numbers for Keras, Python, or any other software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | We used models with three hidden layers of size 1024 and ReLU non-linearities (Section 4.1), All experiments were performed using a fully connected neural network (NN) with 2 hidden layers, 50 units each (Section 4.2), All models were trained for 500 epochs ( 2 10^5 iterations) (Section 4.3), and We use Concrete dropout weight regulariser 10^-8 (derived from the prior length-scale) and dropout regulariser 0.01 N H W (Section 4.4). |