Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?
Authors: Maohao Shen, Jongha (Jon) Ryu, Soumya Ghosh, Yuheng Bu, Prasanna Sattigeri, Subhro Das, Gregory Wornell
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through all these analyses, we conclude that even when EDL methods are empirically effective on downstream tasks, this occurs despite their poor uncertainty quantification capabilities. Our investigation suggests that incorporating model uncertainty can help EDL methods faithfully quantify uncertainties and further improve performance on representative downstream tasks, albeit at the cost of additional computational complexity. |
| Researcher Affiliation | Collaboration | 1Department of EECS, MIT, Cambridge, MA 02139 2MIT-IBM Watson AI Lab, IBM Research, Cambridge, MA 02142 3Department of ECE, University of Florida, Gainesville, FL 32611 |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | 1The code to replicate the experiments is available on https://github.com/maohaos2/EDL-Mirage. |
| Open Datasets | Yes | We consider two ID datasets: CIFAR10, and CIFAR100. For the OOD detection task, we select four OOD datasets for each ID dataset: we use SVHN, FMNIST, Tiny Image Net, and corrupted ID data. ... Fashion-MNIST ... SVHN ... Tiny Image Net (TIM) |
| Dataset Splits | Yes | For in-distribution datasets CIFAR10 and CIFAR100, we divide the original training data into two subsets: a training set and a validation set, using an 80%/20% split ratio. |
| Hardware Specification | Yes | All experiments are implemented in PyTorch using a Tesla V100 GPU with 32 GB memory. |
| Software Dependencies | No | The paper states 'All experiments are implemented in PyTorch' but does not specify a version number for PyTorch or other software dependencies. |
| Experiment Setup | Yes | The maximum training epochs are set to 50, 100, and 200 for 2-D Gaussian data, CIFAR10 and CIFAR100, respectively. ... The training batch size is set to 64, 64, and 256 for Gaussian data, CIFAR10 and CIFAR100, respectively. We use Adam optimizer without weight decay or learning rate schedule during model optimization. The learning rates of the optimizer are 1e-3,2.5e-4,2.5e-4 for Gaussian data, CIFAR10 and CIFAR100, respectively. The default hyper-parameter λ is set to 1e-4 for those EDL methods with regularizer. |