reproducibilityindex.ai

Uncertainty Quantification and Deep Ensembles

Authors: Rahul Rahaman, alexandre thiery

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our simulations indicate that this simple strategy can halve the Expected Calibration Error (ECE) on a range of benchmark classiﬁcation problems compared to standard deep-ensembles in the low data regime. For our experiments, we use standard neural architectures. For CIFAR10/100 [21] we use Res Net18, Res Net34 [17] for Imagenette/Imagewoof [18], and for the Diabetic Retinopathy [7], similar to [26] we use the architecture (not containing any residual connection) from the 5th place solution of the associated Kaggle challenge.
Researcher Affiliation	Academia	Rahul Rahaman Department of Statistics and Data Science, National University of Singapore rahul.rahaman@u.nus.edu Alexandre H. Thiery Department of Statistics and Data Science, National University of Singapore a.h.thiery@nus.edu.sg
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described, nor does it state that code will be released.
Open Datasets	Yes	For CIFAR10/100 [21] we use Res Net18, Res Net34 [17] for Imagenette/Imagewoof [18], and for the Diabetic Retinopathy [7]... We also include the results for Le Net [23] trained on the MNIST [24] dataset in the supplementary.
Dataset Splits	Yes	The validation dataset is chosen from the leftover training dataset. All the experiments are executed 50 times, on the same training set, but with 50 different validation sets of size Nval = 50 for CIFAR10, Imagenette, Imagewoof and Nval = 300 for CIFAR100, and Nval = 500 for the Diabetic Retinopathy dataset.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions using 'standard neural architectures'.
Software Dependencies	No	The paper does not provide specific software dependency details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	For our experiments, we use standard neural architectures. A very low number of training examples (CIFAR10: 1000, CIFAR100: 5000, Image{nette, woof}: 5000, MNIST: 500) was used for all the datasets. For this experiment, an ensemble of K = 30 networks is considered. In all our experiments, we minimized the negative log-likelihood (i.e., cross-entropy). In all our experiments, we have found it computationally more efﬁcient and robust to use a simple grid search for ﬁnding the optimal temperature; we used n = 100 temperatures equally spaced on a logarithmic scale in between τmin = 10 2 and τmax = 10. All the experiments are executed 50 times, on the same training set, but with 50 different validation sets of size Nval = 50 for CIFAR10, Imagenette, Imagewoof and Nval = 300 for CIFAR100, and Nval = 500 for the Diabetic Retinopathy dataset.