Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DEUP: Direct Epistemic Uncertainty Prediction

Authors: Salem Lahlou, Moksh Jain, Hadi Nekoei, Victor I Butoi, Paul Bertin, Jarrid Rector-Brooks, Maksym Korablyov, Yoshua Bengio

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a wide set of experiments, we illustrate how existing methods in sequential model optimization can be improved with epistemic uncertainty estimates from DEUP, and how DEUP can be used to drive exploration in reinforcement learning. We also evaluate the quality of uncertainty estimates from DEUP for probabilistic image classification and predicting synergies of drug combinations. ... In Sec. 5, we experimentally validate that EU estimates from DEUP can improve upon existing SMO methods, drive exploration in RL, and evaluate the quality of these uncertainty estimates in probabilistic image classification and in a regression task predicting synergies of drug combinations.
Researcher Affiliation	Academia	Salem Lahlou EMAIL Mila Quebec AI Institute, Université de Montréal Moksh Jain EMAIL Mila Quebec AI Institute, Université de Montréal Hadi Nekoei EMAIL Mila Quebec AI Institute, Université de Montréal Victor Ion Butoi EMAIL Massachusetts Institute of Technology Paul Bertin EMAIL Mila Quebec AI Institute, Université de Montréal Jarrid Rector-Brooks EMAIL Mila Quebec AI Institute, Université de Montréal Maksym Korablyov EMAIL Mila Quebec AI Institute, Université de Montréal Yoshua Bengio EMAIL Mila Quebec AI Institute, Université de Montréal CIFAR Fellow
Pseudocode	Yes	Algorithm 1 DEUP with a fixed training set Algorithm 2 Pre-filling the uncertainty estimator training dataset De Algorithm 3 Training procedure for DEUP in an Interactive Learning setting Algorithm 4 DEUP-DQN Algorithm 5 DEUP for Drug Combinations
Open Source Code	No	The paper does not provide an explicit link to its own source code for the DEUP methodology. It references `botorch.org` as a base framework for experiments and provides GitHub links for baseline implementations (e.g., `https://github.com/google/uncertainty-baselines`, `https://github.com/y0ast/deterministicuncertainty-quantification`, `https://github.com/y0ast/DUE`) but not for the code related to DEUP itself.
Open Datasets	Yes	We used the Drug Comb and LINCS L1000 datasets (Zagidullin et al., 2019; Subramanian et al., 2017). ...train a Res Net (He et al., 2016) model for CIFAR-10 classification (Krizhevsky, 2009) and reject OOD examples using the estimated uncertainty in the prediction. ... We use examples from SVHN (Netzer et al., 2011) as the OOD examples.
Dataset Splits	Yes	We evaluated the uncertainty methods using a train, validation, test split of 40%, 30%, and 30%, respectively. ...For training DEUP, the CIFAR-10 training set is divided into 5 folds, with each fold containing 8 unique classes.
Hardware Specification	Yes	One complete training run for the DEUP-DQN with 5 seeds experiments takes about 0.04-0.05 GPU days on a V100 GPU. In total RL experiments took about 0.15 GPU days on an Nvidia V100 GPU. ...One complete training run for DEUP takes about 1.5-2 GPU days on a V100 GPU. In total these set of experiments took about 31 GPU days on an Nvidia V100 GPU.
Software Dependencies	No	The paper mentions several software components like Bo Torch (Balandat et al., 2020), Adam (Kingma & Ba, 2015), Masked Autoregressive Flows (Papamakarios et al., 2017), and DUE (van Amersfoort et al., 2021). However, it does not specify concrete version numbers for these software components, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	We used a 3-hidden layer neural network, with 128 neurons per layer and a ReLU activation function, with Adam (Kingma & Ba, 2015) and a learning rate of 10^-3 (and default values for the other hyperparameters) to train the main predictor for DEUP-EI... We used 3 networks for the Ensemble baseline, and a dropout probability of 0.3 for the Dropout baseline, with 100 test-time forward passes to compute uncertainty estimates. ...The hyperparameters are presented in Table 3 and Table 4. ... For all models, we train the main predictor for 75 and 125 epochs for Res Net-18 and Res Net-50 respectively. We use SGD with Momentum (set to 0.9), with a multi-step learning schedule with a decay of 0.2 at epochs [25, 50] and [45, 90] for Res Net-18 and Res Net-50 respectively.