Uncertainty Estimation Using a Single Deep Deterministic Neural Network
Authors: Joost Van Amersfoort, Lewis Smith, Yee Whye Teh, Yarin Gal
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model against the current best approach for estimating uncertainty in Deep Learning, Deep Ensembles, and show that DUQ compares favourably on a number of evaluations, such as out of distribution (Oo D) detection of Fashion MNIST vs MNIST, and CIFAR vs. SVHN. We visualise how DUQ performs on the two moons dataset in Figure 1. |
| Researcher Affiliation | Academia | 1OATML, Department of Computer Science, University of Oxford 2Department of Statistics, University of Oxford. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We make our code publicly available1. 1https://github.com/y0ast/ deterministic-uncertainty-quantification |
| Open Datasets | Yes | In this experiment, we assess the quality of our uncertainty estimation by looking at how well we can separate the test set of Fashion MNIST (Xiao et al., 2017) from the test set of MNIST (Le Cun et al., 1998b) by looking only at the uncertainty predicted by the model. We train our model on Fashion MNIST and we expect it to assign low uncertainty to the Fashion MNIST test set, but high uncertainty to MNIST, since the model has never seen that dataset before and it is very different from Fashion MNIST. In this section we look at the CIFAR-10 dataset (Krizhevsky et al., 2014), with SVHN (Netzer et al., 2019) as Oo D set. |
| Dataset Splits | Yes | Most hyper parameters, such as as the learning rate or weight decay parameter, can be set using the standard train/validation split. However there are two hyper parameters that are particularly important: the length scale σ and the gradient penalty weight λ. We set the length scale by doing a grid search over the interval (0, 1] while keeping λ = 0. We pick the value that leads to the highest validation accuracy. |
| Hardware Specification | Yes | Training for one epoch on a modern 1080 Ti GPU, takes 21 seconds for a softmax/cross entropy model, which leads to 105 seconds for a Deep Ensemble with 5 components. |
| Software Dependencies | No | The paper mentions "scikit-learn (Pedregosa et al., 2011)" and "Py Torch (Paszke et al., 2017)" but does not provide specific version numbers for these software packages or libraries. |
| Experiment Setup | Yes | Most hyper parameters, such as as the learning rate or weight decay parameter, can be set using the standard train/validation split. However there are two hyper parameters that are particularly important: the length scale σ and the gradient penalty weight λ. We set the length scale by doing a grid search over the interval (0, 1] while keeping λ = 0. We pick the value that leads to the highest validation accuracy. We train for a fixed 75 epochs and reduce the learning rate by a factor of 0.2 at 25 and 50 epochs. We use random horizontal flips and random crops as data augmentation. |