LaSCal: Label-Shift Calibration without target labels
Authors: Teodora Popordanoska, Gorjan Radevski, Tinne Tuytelaars, Matthew Blaschko
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our thorough empirical analysis demonstrates the effectiveness and reliability of the proposed approach across different modalities, model architectures and label shift intensities. |
| Researcher Affiliation | Academia | Teodora Popordanoska Gorjan Radevski Tinne Tuytelaars Matthew B. Blaschko ESAT-PSI, KU Leuven firstname.lastname@kuleuven.be |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our codebase is released at the following repository: https://github.com/tpopordanoska/ label-shift-calibration. |
| Open Datasets | Yes | In particular, we use the CIFAR-10/100 Long Tail (LT) datasets [Cao et al., 2019], which are simulated from CIFAR [Krizhevsky et al., 2009] with an imbalance factor (IF)... We additionally use Wilds [Koh et al., 2021] with different modalities: Camelyon17 [Bandi et al., 2018] and i Wild Cam [Beery et al., 2021] with images, and Amazon [Ni et al., 2019] with text. |
| Dataset Splits | Yes | On i Wild Cam, we select the 20 most frequent classes from the target dataset. On both i Wild Cam and Amazon, we obtain a uniform target distribution by subsampling each class, based on the frequency of the least frequent class. In Appendix A.1: From the training dataset, we allocate a validation dataset with the same size as the testing dataset. Then, we subsample the validation dataset the same way as we subsample the training dataset, so that both are effectively drawn from the same (source) distribution (e.g., used in the ablation studies in Section 4.3). |
| Hardware Specification | Yes | We conduct all experiments on consumer-grade GPUs, that is, all experiments can be conducted on a single Nvidia 3090. |
| Software Dependencies | No | We use Py Torch [Paszke et al., 2019] for all deep-learning-based implementations. For all Image Net pre-trained models we use Timm [Wightman et al., 2019] (Camelyon17 and i Wild Cam), while for all pre-trained language models, we use Hugging Face transformers [Wolf et al., 2019]. |
| Experiment Setup | Yes | We keep the same training procedure for both CIFAR-10/100 and their long-tail variants. Namely, we train all models with stochastic gradient descent (SGD) for 200 epochs, with a peak learning rate of 0.1, linearly warmed up for the first 10% of the training, and then decreased to 0.0 until the end. We apply weight decay of 0.0005, and clip the gradients when their norm exceeds 5.0. |