Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Vicinal Label Supervision for Reliable Aleatoric and Epistemic Uncertainty Estimation

Authors: Linye Li, Yufei Chen, Xiaodong Yue

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We begin by analyzing and comparing the estimated uncertainties estimated by our method and baseline approaches on a toy dataset. Subsequently, we conduct extensive experiments on three main tasks: OOD detection, selective classification, and OOD generalization. For the OOD detection task, we evaluate the ability of different methods to distinguish between in-distribution (ID) and out-of-distribution (OOD) samples based on their estimated epistemic uncertainty. For selective classification, we assess the model s capability to differentiate correctly classified samples from misclassified ones using aleatoric uncertainty. For the OOD generalization task, we examine the classification performance of models when exposed to covariate-shifted OOD samples.
Researcher Affiliation	Academia	Linye Li School of Computer Science and Technology Tongji University, Shanghai, China EMAIL Yufei Chen School of Computer Science and Technology Tongji University, Shanghai, China EMAIL Xiaodong Yue Artificial Intelligence Institute Shanghai University, Shanghai, China EMAIL
Pseudocode	No	The paper describes the methodology using mathematical formulations and descriptive text, but it does not contain any explicitly labeled "Pseudocode" or "Algorithm" blocks or figures.
Open Source Code	Yes	5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: See Appendix and supplemental material.
Open Datasets	Yes	Datasets. Following prior EDL works, we conduct OOD detection using CIFAR-10 and CIFAR100 [24] (32 32 resolution). When using CIFAR-10 (or CIFAR-100) as the ID dataset, the OOD datasets include CIFAR-100 (or CIFAR-10), Tiny Image Net [27], MNIST [28], SVHN [39], Textures [25], and Places365 [55]. For OOD generalization, we evaluate on CIFAR-10-C and CIFAR-100-C [17], which contain 15 corruption types (e.g., snow, fog) at 5 severity levels.
Dataset Splits	Yes	Datasets. Following prior EDL works, we conduct OOD detection using CIFAR-10 and CIFAR100 [24] (32 32 resolution). When using CIFAR-10 (or CIFAR-100) as the ID dataset, the OOD datasets include CIFAR-100 (or CIFAR-10), Tiny Image Net [27], MNIST [28], SVHN [39], Textures [25], and Places365 [55]. For OOD generalization, we evaluate on CIFAR-10-C and CIFAR-100-C [17], which contain 15 corruption types (e.g., snow, fog) at 5 severity levels.
Hardware Specification	Yes	Following Open OOD [52], we train a Res Net-18 model [16] implemented in Py Torch [41] for 100 epochs on a single NVIDIA A100 GPU.
Software Dependencies	No	Following Open OOD [52], we train a Res Net-18 model [16] implemented in Py Torch [41] for 100 epochs on a single NVIDIA A100 GPU.
Experiment Setup	Yes	Following Open OOD [52], we train a Res Net-18 model [16] implemented in Py Torch [41] for 100 epochs on a single NVIDIA A100 GPU. We use the SGD optimizer with a cosine annealing schedule, an initial learning rate of 0.1, and a batch size of 128. We set the hyperparameters β = 10 (Eq. 11) and β+ noise = β noise = 1.0.