reproducibilityindex.ai

LaSCal: Label-Shift Calibration without target labels

Authors: Teodora Popordanoska, Gorjan Radevski, Tinne Tuytelaars, Matthew Blaschko

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our thorough empirical analysis demonstrates the effectiveness and reliability of the proposed approach across different modalities, model architectures and label shift intensities.
Researcher Affiliation	Academia	Teodora Popordanoska Gorjan Radevski Tinne Tuytelaars Matthew B. Blaschko ESAT-PSI, KU Leuven firstname.lastname@kuleuven.be
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our codebase is released at the following repository: https://github.com/tpopordanoska/ label-shift-calibration.
Open Datasets	Yes	In particular, we use the CIFAR-10/100 Long Tail (LT) datasets [Cao et al., 2019], which are simulated from CIFAR [Krizhevsky et al., 2009] with an imbalance factor (IF)... We additionally use Wilds [Koh et al., 2021] with different modalities: Camelyon17 [Bandi et al., 2018] and i Wild Cam [Beery et al., 2021] with images, and Amazon [Ni et al., 2019] with text.
Dataset Splits	Yes	On i Wild Cam, we select the 20 most frequent classes from the target dataset. On both i Wild Cam and Amazon, we obtain a uniform target distribution by subsampling each class, based on the frequency of the least frequent class. In Appendix A.1: From the training dataset, we allocate a validation dataset with the same size as the testing dataset. Then, we subsample the validation dataset the same way as we subsample the training dataset, so that both are effectively drawn from the same (source) distribution (e.g., used in the ablation studies in Section 4.3).
Hardware Specification	Yes	We conduct all experiments on consumer-grade GPUs, that is, all experiments can be conducted on a single Nvidia 3090.
Software Dependencies	No	We use Py Torch [Paszke et al., 2019] for all deep-learning-based implementations. For all Image Net pre-trained models we use Timm [Wightman et al., 2019] (Camelyon17 and i Wild Cam), while for all pre-trained language models, we use Hugging Face transformers [Wolf et al., 2019].
Experiment Setup	Yes	We keep the same training procedure for both CIFAR-10/100 and their long-tail variants. Namely, we train all models with stochastic gradient descent (SGD) for 200 epochs, with a peak learning rate of 0.1, linearly warmed up for the first 10% of the training, and then decreased to 0.0 until the end. We apply weight decay of 0.0005, and clip the gradients when their norm exceeds 5.0.