reproducibilityindex.ai

Class Probability Matching with Calibrated Networks for Label Shift Adaption

Authors: Hongwei Wen, Annika Betken, Hanyuan Hang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	From the experimental perspective, real data comparisons show that CPMCN outperforms existing matching-based and EM-based algorithms.
Researcher Affiliation	Academia	Faculty of Electrical Engineering, Mathematics and Computer Science University of Twente Enschede, The Netherlands
Pseudocode	Yes	Algorithm 1 Class Probability Matching with Calibrated Networks (CPMCN)
Open Source Code	Yes	The training code and parameters can be found in the obtaining predictions folder at https://github.com/kundajelab/labelshiftexperiments/tree/ master/notebooks/obtaining_predictions. The calibrated method Bias-Corrected Temperature Scaling (BCTS) is implemented based on the code in https://github.com/kundajelab/ abstention/blob/master/abstention/calibration.py.
Open Datasets	Yes	In this section, we present experimental results on three benchmark datasets (MNIST Le Cun et al. (2010), CIFAR10 Krizhevsky et al. (2009), and CIFAR100 Krizhevsky et al. (2009))
Dataset Splits	Yes	We take the training set of the benchmark datasets as the data for the source domain Dp and reserve 10000 samples out of Dp as a hold-out validation set, which is used to tune the hyper-parameters of the calibrated BCTS model (Alexandari et al., 2020).
Hardware Specification	No	The paper does not specify the exact hardware used for running the experiments (e.g., specific GPU/CPU models, cloud instances with specs).
Software Dependencies	No	The paper mentions 'Python' and 'L-BFGS-B (Zhu et al., 1997)' as an optimization method, but does not provide specific version numbers for Python or other key software libraries/frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup	Yes	For repeating experiments of each method, we use the code from Geifman & El-Yaniv (2017) to train ten different network models with different random seeds. For each model, we perform 10 trials, where each trial consists of a different sampling of the validation set and a different sampling of the label-shifted target domain data. The total number of repetitions is 100.