Concrete Dropout

Authors: Yarin Gal, Jiri Hron, Alex Kendall

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We next analyse the behaviour of our proposed dropout variant on a wide variety of tasks. We study how our dropout variant captures different types of uncertainty on a simple synthetic dataset with known ground truth uncertainty, and show how its behaviour changes with increasing amounts of data versus model size ( 4.1). We show that Concrete dropout matches the performance of handtuned dropout on the UCI datasets ( 4.2) and MNIST ( 4.3), and further demonstrate our variant on large models used in the Computer Vision community ( 4.4). We show a significant reduction in experiment time as well as improved model performance and uncertainty calibration. Lastly, we demonstrate our dropout variant in a model-based RL task extending on [10], showing that the agent correctly reduces its uncertainty dynamically as the amount of data increases ( 4.5).
Researcher Affiliation Academia Yarin Gal yarin.gal@eng.cam.ac.uk University of Cambridge and Alan Turing Institute, London Jiri Hron jh2084@cam.ac.uk University of Cambridge Alex Kendall agk34@cam.ac.uk University of Cambridge
Pseudocode Yes A Python code snippet for Concrete dropout in Keras [5] is given in appendix C
Open Source Code Yes A Python code snippet for Concrete dropout in Keras [5] is given in appendix C, spanning about 20 lines of code, and experiment code is given online2. 2https://github.com/yaringal/Concrete Dropout
Open Datasets Yes We next assess the performance of our technique in a regression setting using the popular UCI benchmark [26]. We further experimented with the standard classification benchmark MNIST [24].
Dataset Splits No The paper mentions using a 'validation set split from the total data' in Section 4.5 and refers to 'maximise validation log-likelihood' in Section 2. It also notes 'Full results as well as experiment setup are given in the appendix D' for UCI datasets, implying details are deferred. However, specific percentages or sample counts for training/validation/test splits are not provided in the main text.
Hardware Specification No The paper mentions the use of 'multiple GPUs' in the context of large models, but it does not specify any particular GPU models, CPU types, or other hardware specifications used for running its experiments.
Software Dependencies No The paper mentions 'Keras [5]' and a 'Python code snippet' but does not provide specific version numbers for Keras, Python, or any other software dependencies needed to replicate the experiment.
Experiment Setup Yes We used models with three hidden layers of size 1024 and ReLU non-linearities (Section 4.1), All experiments were performed using a fully connected neural network (NN) with 2 hidden layers, 50 units each (Section 4.2), All models were trained for 500 epochs ( 2 10^5 iterations) (Section 4.3), and We use Concrete dropout weight regulariser 10^-8 (derived from the prior length-scale) and dropout regulariser 0.01 N H W (Section 4.4).