reproducibilityindex.ai

Uncertainty Estimation Using a Single Deep Deterministic Neural Network

Authors: Joost Van Amersfoort, Lewis Smith, Yee Whye Teh, Yarin Gal

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our model against the current best approach for estimating uncertainty in Deep Learning, Deep Ensembles, and show that DUQ compares favourably on a number of evaluations, such as out of distribution (Oo D) detection of Fashion MNIST vs MNIST, and CIFAR vs. SVHN. We visualise how DUQ performs on the two moons dataset in Figure 1.
Researcher Affiliation	Academia	1OATML, Department of Computer Science, University of Oxford 2Department of Statistics, University of Oxford.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	We make our code publicly available1. 1https://github.com/y0ast/ deterministic-uncertainty-quantification
Open Datasets	Yes	In this experiment, we assess the quality of our uncertainty estimation by looking at how well we can separate the test set of Fashion MNIST (Xiao et al., 2017) from the test set of MNIST (Le Cun et al., 1998b) by looking only at the uncertainty predicted by the model. We train our model on Fashion MNIST and we expect it to assign low uncertainty to the Fashion MNIST test set, but high uncertainty to MNIST, since the model has never seen that dataset before and it is very different from Fashion MNIST. In this section we look at the CIFAR-10 dataset (Krizhevsky et al., 2014), with SVHN (Netzer et al., 2019) as Oo D set.
Dataset Splits	Yes	Most hyper parameters, such as as the learning rate or weight decay parameter, can be set using the standard train/validation split. However there are two hyper parameters that are particularly important: the length scale σ and the gradient penalty weight λ. We set the length scale by doing a grid search over the interval (0, 1] while keeping λ = 0. We pick the value that leads to the highest validation accuracy.
Hardware Specification	Yes	Training for one epoch on a modern 1080 Ti GPU, takes 21 seconds for a softmax/cross entropy model, which leads to 105 seconds for a Deep Ensemble with 5 components.
Software Dependencies	No	The paper mentions "scikit-learn (Pedregosa et al., 2011)" and "Py Torch (Paszke et al., 2017)" but does not provide specific version numbers for these software packages or libraries.
Experiment Setup	Yes	Most hyper parameters, such as as the learning rate or weight decay parameter, can be set using the standard train/validation split. However there are two hyper parameters that are particularly important: the length scale σ and the gradient penalty weight λ. We set the length scale by doing a grid search over the interval (0, 1] while keeping λ = 0. We pick the value that leads to the highest validation accuracy. We train for a ﬁxed 75 epochs and reduce the learning rate by a factor of 0.2 at 25 and 50 epochs. We use random horizontal ﬂips and random crops as data augmentation.