Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Class Probability Matching with Calibrated Networks for Label Shift Adaption
Authors: Hongwei Wen, Annika Betken, Hanyuan Hang
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | From the experimental perspective, real data comparisons show that CPMCN outperforms existing matching-based and EM-based algorithms. |
| Researcher Affiliation | Academia | Faculty of Electrical Engineering, Mathematics and Computer Science University of Twente Enschede, The Netherlands |
| Pseudocode | Yes | Algorithm 1 Class Probability Matching with Calibrated Networks (CPMCN) |
| Open Source Code | Yes | The training code and parameters can be found in the obtaining predictions folder at https://github.com/kundajelab/labelshiftexperiments/tree/ master/notebooks/obtaining_predictions. The calibrated method Bias-Corrected Temperature Scaling (BCTS) is implemented based on the code in https://github.com/kundajelab/ abstention/blob/master/abstention/calibration.py. |
| Open Datasets | Yes | In this section, we present experimental results on three benchmark datasets (MNIST Le Cun et al. (2010), CIFAR10 Krizhevsky et al. (2009), and CIFAR100 Krizhevsky et al. (2009)) |
| Dataset Splits | Yes | We take the training set of the benchmark datasets as the data for the source domain Dp and reserve 10000 samples out of Dp as a hold-out validation set, which is used to tune the hyper-parameters of the calibrated BCTS model (Alexandari et al., 2020). |
| Hardware Specification | No | The paper does not specify the exact hardware used for running the experiments (e.g., specific GPU/CPU models, cloud instances with specs). |
| Software Dependencies | No | The paper mentions 'Python' and 'L-BFGS-B (Zhu et al., 1997)' as an optimization method, but does not provide specific version numbers for Python or other key software libraries/frameworks (e.g., PyTorch, TensorFlow). |
| Experiment Setup | Yes | For repeating experiments of each method, we use the code from Geifman & El-Yaniv (2017) to train ten different network models with different random seeds. For each model, we perform 10 trials, where each trial consists of a different sampling of the validation set and a different sampling of the label-shifted target domain data. The total number of repetitions is 100. |