Class Probability Matching with Calibrated Networks for Label Shift Adaption

Authors: Hongwei Wen, Annika Betken, Hanyuan Hang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental From the experimental perspective, real data comparisons show that CPMCN outperforms existing matching-based and EM-based algorithms.
Researcher Affiliation Academia Faculty of Electrical Engineering, Mathematics and Computer Science University of Twente Enschede, The Netherlands
Pseudocode Yes Algorithm 1 Class Probability Matching with Calibrated Networks (CPMCN)
Open Source Code Yes The training code and parameters can be found in the obtaining predictions folder at https://github.com/kundajelab/labelshiftexperiments/tree/ master/notebooks/obtaining_predictions. The calibrated method Bias-Corrected Temperature Scaling (BCTS) is implemented based on the code in https://github.com/kundajelab/ abstention/blob/master/abstention/calibration.py.
Open Datasets Yes In this section, we present experimental results on three benchmark datasets (MNIST Le Cun et al. (2010), CIFAR10 Krizhevsky et al. (2009), and CIFAR100 Krizhevsky et al. (2009))
Dataset Splits Yes We take the training set of the benchmark datasets as the data for the source domain Dp and reserve 10000 samples out of Dp as a hold-out validation set, which is used to tune the hyper-parameters of the calibrated BCTS model (Alexandari et al., 2020).
Hardware Specification No The paper does not specify the exact hardware used for running the experiments (e.g., specific GPU/CPU models, cloud instances with specs).
Software Dependencies No The paper mentions 'Python' and 'L-BFGS-B (Zhu et al., 1997)' as an optimization method, but does not provide specific version numbers for Python or other key software libraries/frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup Yes For repeating experiments of each method, we use the code from Geifman & El-Yaniv (2017) to train ten different network models with different random seeds. For each model, we perform 10 trials, where each trial consists of a different sampling of the validation set and a different sampling of the label-shifted target domain data. The total number of repetitions is 100.