Quantification of Uncertainty with Adversarial Models
Authors: Kajetan Schweighofer, Lukas Aichberger, Mykyta Ielanskyi, Günter Klambauer, Sepp Hochreiter
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that QUAM excels in capturing epistemic uncertainty for deep learning models and outperforms previous methods on challenging tasks in the vision domain. In this section, we compare previous uncertainty quantification methods and our method QUAM in a set of experiments. First, we assess the considered methods on a synthetic benchmark, on which it is feasible to compute a ground truth epistemic uncertainty. Then, we conduct challenging out-of-distribution (OOD) detection, adversarial example detection, misclassification detection and selective prediction experiments in the vision domain. |
| Researcher Affiliation | Academia | ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria |
| Pseudocode | Yes | Algorithm 1 Adversarial Model Search (used in QUAM) |
| Open Source Code | Yes | Our code is publicly available at https://github.com/ml-jku/quam. |
| Open Datasets | Yes | We evaluated all considered methods on the two-moons dataset, created using the implementation of Pedregosa et al. [2011]. MNIST [Le Cun et al., 1998] and its OOD derivatives as the most basic benchmark and Image Net1K [Deng et al., 2009] to demonstrate our method s ability to perform on a larger scale. |
| Dataset Splits | Yes | Furthermore, we evaluated the utility of the uncertainty score for misclassification detection of predictions of the reference model on the Image Net-1K validation dataset. Additionally, we analyze the calibration of QUAM compared to other baseline methods. Therefore, we compute the expected calibration error (ECE) [Guo et al., 2017] on the Image Net-1K validation dataset using the expected predictive distribution. |
| Hardware Specification | Yes | The example in Sec. C.2 was computed within half an hour on a GTX 1080 Ti. Note that the HMC baseline took approximately 14 hours on 36 CPU cores for the classification task. Executing the experiments on Image Net (Sec. C.4.2) took about 100 GPU-hours on a mix of A100 and A40 GPUs, corresponding to around 45 GPU-seconds per sample. |
| Software Dependencies | No | The paper mentions software like PyTorch and refers to specific implementations by authors (e.g., 'The HMC implementation of Cobb and Jalaian [2021]'), but it does not provide explicit version numbers for these software dependencies within the text. |
| Experiment Setup | Yes | For all methods, we utilize the same two-layer fully connected neural network with hidden size of 10; for MC dropout we additionally added dropout with dropout probability 0.2 after every intermediate layer. For QUAM, the initial penalty parameter found by tuning was c0 = 6, which was exponentially increased (ct+1 = ηct) with η = 2 every 14 gradient steps for a total of two epochs through the training dataset. Gradient steps were performed using Adam [Kingma and Ba, 2014] with a learning rate of 5.e-3 and weight decay of 1.e-3, chosen equivalent to the original training parameters of the model. The hyperparameters α = 1.e-3 and η = 1.01 resulted in the overall highest performance and have thus jointly been used for each of the three experiments. |