Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Uncertainty Estimation by Flexible Evidential Deep Learning
Authors: Taeseong Yoon, Heeyoung Kim
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate F-EDL across a wide range of UQ-related downstream tasks, including classification, misclassification detection, OOD detection, and distribution shift detection. Notably, F-EDL consistently achieves state-of-the-art performance across diverse settings, including classical, long-tailed, and noisy ID scenarios, highlighting its robustness and generalizability. In addition, qualitative analyses show that F-EDL captures interpretable multimodal uncertainty reflecting ambiguity across plausible classes, while demonstrating faithful epistemic behavior that decreases with more data. |
| Researcher Affiliation | Academia | Taeseong Yoon Heeyoung Kim Department of Industrial and Systems Engineering, KAIST EMAIL |
| Pseudocode | Yes | Algorithm 1 F-EDL Training Algorithm 2 F-EDL Prediction and Uncertainty Quantification |
| Open Source Code | Yes | The code for our model is publicly available at https://github.com/Taeseong Yoon/F-EDL. |
| Open Datasets | Yes | In the classical setting, CIFAR-10 and CIFAR-100 [45] were used as the primary ID datasets. For the long-tailed setting, we used CIFAR-10-LT [46], an artificially imbalanced version of CIFAR-10, as the ID dataset. For the noisy setting, DMNIST, a variant of MNIST containing ambiguous data points, was used as the ID dataset. For OOD detection, SVHN [47] and CIFAR100 served as OOD datasets when CIFAR-10 or CIFAR-10-LT was used as ID, while SVHN and Tiny Image Net (TIN) were used as the OOD datasets for CIFAR-100. FMNIST was used as the OOD dataset for DMNIST. For distribution shift detection, we utilized CIFAR-10-C [48], a dataset created by applying continuous distribution shifts to CIFAR-10, as the OOD dataset. |
| Dataset Splits | Yes | The dataset contains 50,000 training images and 10,000 testing images... For our experiments, the training set is further divided into training and validation subsets with a ratio of 0.95 : 0.05. ... DMNIST contains 120,000 training images (60,000 from MNIST and 60,000 from AMNIST) and 70,000 testing images (10,000 from MNIST and 60,000 from AMNIST). |
| Hardware Specification | Yes | Depending on availability, we used either an RTX 4060 GPU with 8GB memory or a TITAN V GPU with 12GB memory. |
| Software Dependencies | No | All experiments were implemented in Py Torch. |
| Experiment Setup | Yes | The detailed experimental setups and hyperparameter configurations are provided in Table 10. All experiments were implemented in Py Torch. Depending on availability, we used either an RTX 4060 GPU with 8GB memory or a TITAN V GPU with 12GB memory. |