Mix-n-Match : Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning

Authors: Jize Zhang, Bhavya Kailkhura, T. Yong-Jin Han

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approaches outperform state-of-the-art solutions on both the calibration as well as the evaluation tasks in most of the experimental settings. Our codes are available at https://github.com/zhang64llnl/Mix-n-Match-Calibration.
Researcher Affiliation Academia Jize Zhang 1 Bhavya Kailkhura 1 T. Yong-Jin Han 1 1Lawrence Livermore National Laboratories Livermore, CA 994550. Correspondence to: Jize Zhang <zhang64@llnl.gov>.
Pseudocode No The paper describes procedural steps for algorithms (e.g., IRM in Section 3.3.2) but does not present them in a structured pseudocode block or algorithm figure.
Open Source Code Yes Our codes are available at https://github.com/zhang64llnl/Mix-n-Match-Calibration.
Open Datasets Yes We calibrate various deep neural network classifiers on popular computer vision datasets: CIFAR-10/100 (Krizhevsky, 2009) with 10/100 classes and Image Net (Deng et al., 2009) with 1000 classes.
Dataset Splits Yes We use 45000 images for training and hold out 15000 images for calibration and evaluation. For Image Net, we acquired 4 pretrained models from (Paszke et al., 2019) which were trained with 1.3 million images, and 50000 images are hold out for calibration and evaluation. We randomly split the hold-out dataset into nc = 5000, ne = 10000 for CIFAR-10/100 and nc = ne = 25000 for Image Net.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers. While it references the PyTorch paper, it does not state the PyTorch version or any other software versions used.
Experiment Setup No The paper states that "The training detail is described in Sec. S6.", indicating that specific experimental setup details such as hyperparameters are not present in the main text.