On the Practicality of Deterministic Epistemic Uncertainty

Authors: Janis Postels, Mattia Segù, Tao Sun, Luca Daniel Sieber, Luc Van Gool, Fisher Yu, Federico Tombari

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To this end, we first provide a taxonomy of DUMs, and evaluate their calibration under continuous distributional shifts. Then, we extend them to semantic segmentation. We find that, while DUMs scale to realistic vision tasks and perform well on OOD detection, the practicality of current methods is undermined by poor calibration under distributional shifts.
Researcher Affiliation Collaboration 1ETH Zurich 2Technical University Munich 3Google.
Pseudocode No The paper describes methods and processes in narrative text and refers to existing implementations, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper states: 'When an implementation was publicly available, we relied on it. This is the case for DUQ (https://github.com/y0ast/deterministic-uncertainty-quantification), SNGP (https://github.com/google/uncertainty-baselines/blob/master/baselines/imagenet/sngp.py) and DUE (https://github.com/y0ast/DUE).' This refers to third-party code they used, not their own. There is no explicit statement or link to their own source code for the evaluation framework described in the paper.
Open Datasets Yes Datasets. We train DUMs on CIFAR-10 and CIFAR100 (Krizhevsky et al., 2014) and evaluate on the corrupted versions of their test set CIFAR10/100-C (Hendrycks & Dietterich, 2019). ... We evaluate on synthetic distributional shifts using a corrupted version of Cityscapes (Cordts et al., 2016) (Cityscapes-C (Michaelis et al., 2019)).
Dataset Splits Yes We choose the hyperparameter such that it minimizes the validation loss. ... We uniformly sample a validation set.
Hardware Specification Yes The runtimes were measured on a single V100 using CIFAR10 and a ResNet50 backbone.
Software Dependencies Yes All methods were re-implemented in Tensorflow 2.0.
Experiment Setup Yes We used a batch size of 128 samples and trained for 200 epochs. ... We used for the single softmax model the Adam optimizer with learning rate 0.003, and L2 weight regularization 0.0001. ... We trained DUQ with the SGD optimizer with learning rate 0.01, L2 weight regularization 0.0001, and a multi-step learning rate decay policy with decay rate 0.3 and decay steps at the epochs 10, 20.