reproducibilityindex.ai

On the Practicality of Deterministic Epistemic Uncertainty

Authors: Janis Postels, Mattia Segù, Tao Sun, Luca Daniel Sieber, Luc Van Gool, Fisher Yu, Federico Tombari

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To this end, we first provide a taxonomy of DUMs, and evaluate their calibration under continuous distributional shifts. Then, we extend them to semantic segmentation. We find that, while DUMs scale to realistic vision tasks and perform well on OOD detection, the practicality of current methods is undermined by poor calibration under distributional shifts.
Researcher Affiliation	Collaboration	1ETH Zurich 2Technical University Munich 3Google.
Pseudocode	No	The paper describes methods and processes in narrative text and refers to existing implementations, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper states: 'When an implementation was publicly available, we relied on it. This is the case for DUQ (https://github.com/y0ast/deterministic-uncertainty-quantification), SNGP (https://github.com/google/uncertainty-baselines/blob/master/baselines/imagenet/sngp.py) and DUE (https://github.com/y0ast/DUE).' This refers to third-party code they used, not their own. There is no explicit statement or link to their own source code for the evaluation framework described in the paper.
Open Datasets	Yes	Datasets. We train DUMs on CIFAR-10 and CIFAR100 (Krizhevsky et al., 2014) and evaluate on the corrupted versions of their test set CIFAR10/100-C (Hendrycks & Dietterich, 2019). ... We evaluate on synthetic distributional shifts using a corrupted version of Cityscapes (Cordts et al., 2016) (Cityscapes-C (Michaelis et al., 2019)).
Dataset Splits	Yes	We choose the hyperparameter such that it minimizes the validation loss. ... We uniformly sample a validation set.
Hardware Specification	Yes	The runtimes were measured on a single V100 using CIFAR10 and a ResNet50 backbone.
Software Dependencies	Yes	All methods were re-implemented in Tensorflow 2.0.
Experiment Setup	Yes	We used a batch size of 128 samples and trained for 200 epochs. ... We used for the single softmax model the Adam optimizer with learning rate 0.003, and L2 weight regularization 0.0001. ... We trained DUQ with the SGD optimizer with learning rate 0.01, L2 weight regularization 0.0001, and a multi-step learning rate decay policy with decay rate 0.3 and decay steps at the epochs 10, 20.