On the Practicality of Deterministic Epistemic Uncertainty
Authors: Janis Postels, Mattia Segù, Tao Sun, Luca Daniel Sieber, Luc Van Gool, Fisher Yu, Federico Tombari
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To this end, we first provide a taxonomy of DUMs, and evaluate their calibration under continuous distributional shifts. Then, we extend them to semantic segmentation. We find that, while DUMs scale to realistic vision tasks and perform well on OOD detection, the practicality of current methods is undermined by poor calibration under distributional shifts. |
| Researcher Affiliation | Collaboration | 1ETH Zurich 2Technical University Munich 3Google. |
| Pseudocode | No | The paper describes methods and processes in narrative text and refers to existing implementations, but it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: 'When an implementation was publicly available, we relied on it. This is the case for DUQ (https://github.com/y0ast/deterministic-uncertainty-quantification), SNGP (https://github.com/google/uncertainty-baselines/blob/master/baselines/imagenet/sngp.py) and DUE (https://github.com/y0ast/DUE).' This refers to third-party code they used, not their own. There is no explicit statement or link to their own source code for the evaluation framework described in the paper. |
| Open Datasets | Yes | Datasets. We train DUMs on CIFAR-10 and CIFAR100 (Krizhevsky et al., 2014) and evaluate on the corrupted versions of their test set CIFAR10/100-C (Hendrycks & Dietterich, 2019). ... We evaluate on synthetic distributional shifts using a corrupted version of Cityscapes (Cordts et al., 2016) (Cityscapes-C (Michaelis et al., 2019)). |
| Dataset Splits | Yes | We choose the hyperparameter such that it minimizes the validation loss. ... We uniformly sample a validation set. |
| Hardware Specification | Yes | The runtimes were measured on a single V100 using CIFAR10 and a ResNet50 backbone. |
| Software Dependencies | Yes | All methods were re-implemented in Tensorflow 2.0. |
| Experiment Setup | Yes | We used a batch size of 128 samples and trained for 200 epochs. ... We used for the single softmax model the Adam optimizer with learning rate 0.003, and L2 weight regularization 0.0001. ... We trained DUQ with the SGD optimizer with learning rate 0.01, L2 weight regularization 0.0001, and a multi-step learning rate decay policy with decay rate 0.3 and decay steps at the epochs 10, 20. |