Sequential Covariate Shift Detection Using Classifier Two-Sample Tests

Authors: Sooyong Jang, Sangdon Park, Insup Lee, Osbert Bastani

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on both synthetic and natural shifts on Image Net (Russakovsky et al., 2015) , and natural shifts on two datasets from the WILDS datasets (Koh et al., 2021). We demonstrate that our approach achieves better sample efficiency than baseline algorithms; furthermore, it satisfies the desired false positive rate. Thus, our algorithm is an effective strategy for sequential covariate shift detection.
Researcher Affiliation Academia 1PRECISE Center, University of Pennsylvania, USA. 2School of Cybersecurity and Privacy, Georgia Institute of Technology, USA. Correspondence to: Sooyong Jang <sooyong@seas.upenn.edu>.
Pseudocode Yes Algorithm 1 Sequential Calibrated Classifier Two-Sample Test
Open Source Code Yes We have released our code for these experiments.1 https://github.com/sooyongj/sequential_covariate_shift_detection
Open Datasets Yes We evaluate our approach on both synthetic and natural shifts on Image Net (Russakovsky et al., 2015) , and natural shifts on two datasets from the WILDS datasets (Koh et al., 2021).
Dataset Splits No The paper describes setting up source and target datasets for the covariate shift detection problem (e.g., "split the original Image Net validation set into equal sized source and target datasets"), but it does not provide traditional train/validation/test splits for its own model's training process or hyperparameter tuning.
Hardware Specification No The paper does not mention any specific hardware used for running the experiments, such as CPU or GPU models.
Software Dependencies No The paper mentions software components like "SGD optimizer", "neural network", "Res Net152 model", "Res Net50 model", "Code GPT model", but it does not specify any version numbers for these or other software dependencies.
Experiment Setup Yes We use a fully-connected neural network with a single hidden layer (with 128 hidden units) and with the Re LU activation functions as the source-target classifier ˆgt. We use a binary cross-entropy loss for training in conjunction with an SGD optimizer with a learning rate of 0.01 (for natural shift experiments) and 0.001 (for synthetic shift experiments).