Evaluating Robustness of Predictive Uncertainty Estimation: Are Dirichlet-based Models Reliable?

Authors: Anna-Kathrin Kopetzki, Bertrand Charpentier, Daniel Zügner, Sandhya Giri, Stephan Günnemann

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results suggest that uncertainty estimates of DBU models are not robust w.r.t. three important tasks: (1) indicating correctly and wrongly classified samples; (2) detecting adversarial examples; and (3) distinguishing between in-distribution (ID) and out-of-distribution (OOD) data. Additionally, we explore the first approaches to make DBU models more robust. While adversarial training has a minor effect, our median smoothing based approach significantly increases robustness of DBU models. Experiments are performed on two image data sets (MNIST (Le Cun & Cortes, 2010) and CIFAR10 (Krizhevsky et al., 2009)), which contain bounded inputs and two tabular data sets (Segment (Dua & Graff, 2017) and Sensorless drive (Dua & Graff, 2017)), consisting of unbounded inputs.
Researcher Affiliation Academia 1Technical University of Munich, Germany; Department of Informatics.
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes The code and further supplementary material is available online (www.daml.in.tum.de/dbu-robustness).
Open Datasets Yes Experiments are performed on two image data sets (MNIST (Le Cun & Cortes, 2010) and CIFAR10 (Krizhevsky et al., 2009)), which contain bounded inputs and two tabular data sets (Segment (Dua & Graff, 2017) and Sensorless drive (Dua & Graff, 2017)), consisting of unbounded inputs. As Prior Net requires OOD training data, we use two further image data sets (Fashion MNIST (Xiao et al., 2017) and CIFAR100 (Krizhevsky et al., 2009)) for training on MNIST and CIFAR10, respectively.
Dataset Splits No The paper states, 'Further details on the experimental setup are provided in the appendix (see Section 6.2).', but does not explicitly provide training/validation/test splits with percentages or sample counts in the main text.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) were mentioned for running experiments.
Software Dependencies No No specific software dependencies with version numbers were mentioned.
Experiment Setup No The paper states, 'Further details on the experimental setup are provided in the appendix (see Section 6.2).', but these details are not present in the provided main text.