Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Transitional Uncertainty with Layered Intermediate Predictions
Authors: Ryan Benkert, Mohit Prabhushankar, Ghassan Alregib
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that TULIP matches or outperforms current single-pass methods on standard benchmarks and in practical settings where these methods are less reliable (imbalances, complex architectures, medical modalities). |
| Researcher Affiliation | Academia | 1School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA. |
| Pseudocode | Yes | In addition to our description in the main paper, we provide implementation details and algorithm pseudo code in Appendix D. |
| Open Source Code | No | When a implementation was publicly available, we heavily relied on it in our own code. This is the case for DUQ (https://github.com/y0ast/deterministic-uncertaintyquantification), and SNGP (https://github.com/google/uncertainty-baselines/blob/master/baselines/imagenet/sngp.py, as well as https://github.com/y0ast/DUE). |
| Open Datasets | Yes | The following combinations are evaluated: CIFAR10 vs. CIFAR10-C/CIFAR100-C/SVHN and CIFAR100 vs. CIFAR10-C/CIFAR100-C/SVHN (Krizhevsky et al., 2009; Netzer et al., 2011; Hendrycks & Dietterich, 2019). |
| Dataset Splits | Yes | During training, the shallow-deep network exits are trained jointly with the feed-forward component, while the combination head is fitted after optimization on a validation set extracted from the training data XID. |
| Hardware Specification | Yes | For all of our experiments we use a single NVIDIA Ge Force GTX 1080 Ti. |
| Software Dependencies | No | All experiments are implemented with pytorch. |
| Experiment Setup | Yes | In all experiments, we train a resnet-18 architecture (He et al., 2016) over 200 epochs and optimize with stochastic gradient descent with a learning rate of 0.01. We further decrease the learning rate by a factor of 0.2 in epochs 100, 125, 150, and 175 respectively, and use the data augmentations random crop, random horizontal flip, and cutout to increase the generalization performance. |