PAC Prediction Sets Under Label Shift

Authors: Wenwen Si, Sangdon Park, Insup Lee, Edgar Dobriban, Osbert Bastani

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate our approach on four datasets: the CIFAR-10 and Chest X-Ray image datasets, the tabular CDC Heart Dataset, and the AGNews text dataset. Our algorithm satisfies the PAC guarantee while producing smaller prediction set sizes compared to several baselines.
Researcher Affiliation Academia 1Department of Computer & Information Science, University of Pennsylvania 2Graduate School of AI and Computer Science & Engineering, POSTECH 3Department of Statistics and Data Science, University of Pennsylvania
Pseudocode Yes Algorithm 1 PS-W: PAC prediction sets in the label shift setting.
Open Source Code Yes Reproducibility statement. Our code is available at https://github.com/averysi224/pac-ps-label-shift for reproducing our experiments.
Open Datasets Yes We empirically evaluate our approach on four datasets: the CIFAR-10 (Krizhevsky et al., 2009) and Chest X-Ray (National Institutes of Health and others, 2022) image datasets, the tabular CDC Heart Dataset (Centers for Disease Control and Prevention (CDC), 1984), and the AGNews text dataset (Zhang et al., 2015).
Dataset Splits Yes First, we split the full dataset into training data, and base source and target datasets. We use the training dataset to fit a score function. Given label distributions PY and QY , we generate the source dataset Sm, target dataset T X n , and a labeled, size o target test dataset (sampled from Q) by sampling with replacement from the corresponding base datasets.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) are provided for running experiments.
Software Dependencies No No specific software dependencies with version numbers (e.g., library versions, programming language versions) are mentioned.
Experiment Setup Yes We use a two-layer MLP for the CDC Heart data with an SGD optimizer having a learning rate of 0.03 and a momentum of 0.9, using a batch size of 64 for 30 epochs. For CIFAR-10, we finetune a pretrained Res Net50 He et al. (2016), with a learning rate of 0.01 for 56 epochs.