A Learning Based Hypothesis Test for Harmful Covariate Shift
Authors: Tom Ginsberg, Zhongyuan Liang, Rahul G Krishnan
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate the ability of our method to detect harmful covariate shift with statistical certainty on a variety of high-dimensional datasets. Across numerous domains and modalities, we show state-of-the-art performance compared to existing methods, particularly when the number of observed test samples is small1. |
| Researcher Affiliation | Academia | Tom Ginsberg & Zhongyuan Liang & Rahul G. Krishnan Department of Computer Science University of Toronto Toronto, ON M5S 1A1 {tomginsberg,zhongyuan,rahulgk}@cs.toronto.edu |
| Pseudocode | Yes | Algorithm 1: The Detectron algorithm for detecting harmful covariate shift |
| Open Source Code | Yes | Code available at https://github.com/rgklab/detectron |
| Open Datasets | Yes | We use the CIFAR-10.1 dataset [Recht et al., 2019] where shift comes from subtle changes in the dataset creation processes, the Camelyon17 dataset [Veeling et al., 2018] for metastases detection in histopathological slides from multiple hospitals, as well as the UCI heart disease dataset [Janosi et al., 1988] which contains tabular features collected across international health systems and indicators of heart disease. |
| Dataset Splits | Yes | Ptrain, Pval, P Partition(P) [...] For the neural network model, we use a simple MLP with an input dimension of 9, 3 hidden layers of size 16 with Re LU activation followed by a 30% dropout layer and a linear layer to 2 outputs (heart disease present or not). We use 358 samples for training and 120 for validation. We train for a maximum of 1000 epochs and select the model with the highest AUC on the validation set, performing early stopping if the validation AUC has not increased in over 100 epochs. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU models, CPU specifications, or memory, beyond general mentions like 'computational cost' or 'training multiple deep models'. |
| Software Dependencies | No | The paper mentions software like 'torchvision library', 'ADAM optimizer', 'XGboost library', and 'Wolfram Mathematica', but does not provide specific version numbers for these components to ensure reproducible software environment. |
| Experiment Setup | Yes | For the neural network model, we use a simple MLP with an input dimension of 9, 3 hidden layers of size 16 with Re LU activation followed by a 30% dropout layer and a linear layer to 2 outputs (heart disease present or not). We use 358 samples for training and 120 for validation. We train for a maximum of 1000 epochs and select the model with the highest AUC on the validation set, performing early stopping if the validation AUC has not increased in over 100 epochs. [...] We use stochastic gradient descent (SGD) with a base learning rate of 0.1, L2 regularization of 5 10 4, momentum of 0.9, a batch size of 128 and a cosine annealing learning rate schedule with a maximum 200 iterations stepped once per epoch for a total of 200 epochs. |