Evaluating Robustness of Predictive Uncertainty Estimation: Are Dirichlet-based Models Reliable?
Authors: Anna-Kathrin Kopetzki, Bertrand Charpentier, Daniel Zügner, Sandhya Giri, Stephan Günnemann
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results suggest that uncertainty estimates of DBU models are not robust w.r.t. three important tasks: (1) indicating correctly and wrongly classified samples; (2) detecting adversarial examples; and (3) distinguishing between in-distribution (ID) and out-of-distribution (OOD) data. Additionally, we explore the first approaches to make DBU models more robust. While adversarial training has a minor effect, our median smoothing based approach significantly increases robustness of DBU models. Experiments are performed on two image data sets (MNIST (Le Cun & Cortes, 2010) and CIFAR10 (Krizhevsky et al., 2009)), which contain bounded inputs and two tabular data sets (Segment (Dua & Graff, 2017) and Sensorless drive (Dua & Graff, 2017)), consisting of unbounded inputs. |
| Researcher Affiliation | Academia | 1Technical University of Munich, Germany; Department of Informatics. |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | The code and further supplementary material is available online (www.daml.in.tum.de/dbu-robustness). |
| Open Datasets | Yes | Experiments are performed on two image data sets (MNIST (Le Cun & Cortes, 2010) and CIFAR10 (Krizhevsky et al., 2009)), which contain bounded inputs and two tabular data sets (Segment (Dua & Graff, 2017) and Sensorless drive (Dua & Graff, 2017)), consisting of unbounded inputs. As Prior Net requires OOD training data, we use two further image data sets (Fashion MNIST (Xiao et al., 2017) and CIFAR100 (Krizhevsky et al., 2009)) for training on MNIST and CIFAR10, respectively. |
| Dataset Splits | No | The paper states, 'Further details on the experimental setup are provided in the appendix (see Section 6.2).', but does not explicitly provide training/validation/test splits with percentages or sample counts in the main text. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) were mentioned for running experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned. |
| Experiment Setup | No | The paper states, 'Further details on the experimental setup are provided in the appendix (see Section 6.2).', but these details are not present in the provided main text. |