Uncertainty Quantification and Deep Ensembles

Authors: Rahul Rahaman, alexandre thiery

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our simulations indicate that this simple strategy can halve the Expected Calibration Error (ECE) on a range of benchmark classification problems compared to standard deep-ensembles in the low data regime. For our experiments, we use standard neural architectures. For CIFAR10/100 [21] we use Res Net18, Res Net34 [17] for Imagenette/Imagewoof [18], and for the Diabetic Retinopathy [7], similar to [26] we use the architecture (not containing any residual connection) from the 5th place solution of the associated Kaggle challenge.
Researcher Affiliation Academia Rahul Rahaman Department of Statistics and Data Science, National University of Singapore rahul.rahaman@u.nus.edu Alexandre H. Thiery Department of Statistics and Data Science, National University of Singapore a.h.thiery@nus.edu.sg
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access to source code for the methodology described, nor does it state that code will be released.
Open Datasets Yes For CIFAR10/100 [21] we use Res Net18, Res Net34 [17] for Imagenette/Imagewoof [18], and for the Diabetic Retinopathy [7]... We also include the results for Le Net [23] trained on the MNIST [24] dataset in the supplementary.
Dataset Splits Yes The validation dataset is chosen from the leftover training dataset. All the experiments are executed 50 times, on the same training set, but with 50 different validation sets of size Nval = 50 for CIFAR10, Imagenette, Imagewoof and Nval = 300 for CIFAR100, and Nval = 500 for the Diabetic Retinopathy dataset.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions using 'standard neural architectures'.
Software Dependencies No The paper does not provide specific software dependency details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup Yes For our experiments, we use standard neural architectures. A very low number of training examples (CIFAR10: 1000, CIFAR100: 5000, Image{nette, woof}: 5000, MNIST: 500) was used for all the datasets. For this experiment, an ensemble of K = 30 networks is considered. In all our experiments, we minimized the negative log-likelihood (i.e., cross-entropy). In all our experiments, we have found it computationally more efficient and robust to use a simple grid search for finding the optimal temperature; we used n = 100 temperatures equally spaced on a logarithmic scale in between τmin = 10 2 and τmax = 10. All the experiments are executed 50 times, on the same training set, but with 50 different validation sets of size Nval = 50 for CIFAR10, Imagenette, Imagewoof and Nval = 300 for CIFAR100, and Nval = 500 for the Diabetic Retinopathy dataset.