reproducibilityindex.ai

A Learning Based Hypothesis Test for Harmful Covariate Shift

Authors: Tom Ginsberg, Zhongyuan Liang, Rahul G Krishnan

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we demonstrate the ability of our method to detect harmful covariate shift with statistical certainty on a variety of high-dimensional datasets. Across numerous domains and modalities, we show state-of-the-art performance compared to existing methods, particularly when the number of observed test samples is small1.
Researcher Affiliation	Academia	Tom Ginsberg & Zhongyuan Liang & Rahul G. Krishnan Department of Computer Science University of Toronto Toronto, ON M5S 1A1 {tomginsberg,zhongyuan,rahulgk}@cs.toronto.edu
Pseudocode	Yes	Algorithm 1: The Detectron algorithm for detecting harmful covariate shift
Open Source Code	Yes	Code available at https://github.com/rgklab/detectron
Open Datasets	Yes	We use the CIFAR-10.1 dataset [Recht et al., 2019] where shift comes from subtle changes in the dataset creation processes, the Camelyon17 dataset [Veeling et al., 2018] for metastases detection in histopathological slides from multiple hospitals, as well as the UCI heart disease dataset [Janosi et al., 1988] which contains tabular features collected across international health systems and indicators of heart disease.
Dataset Splits	Yes	Ptrain, Pval, P Partition(P) [...] For the neural network model, we use a simple MLP with an input dimension of 9, 3 hidden layers of size 16 with Re LU activation followed by a 30% dropout layer and a linear layer to 2 outputs (heart disease present or not). We use 358 samples for training and 120 for validation. We train for a maximum of 1000 epochs and select the model with the highest AUC on the validation set, performing early stopping if the validation AUC has not increased in over 100 epochs.
Hardware Specification	No	The paper does not provide specific details about the hardware used, such as GPU models, CPU specifications, or memory, beyond general mentions like 'computational cost' or 'training multiple deep models'.
Software Dependencies	No	The paper mentions software like 'torchvision library', 'ADAM optimizer', 'XGboost library', and 'Wolfram Mathematica', but does not provide specific version numbers for these components to ensure reproducible software environment.
Experiment Setup	Yes	For the neural network model, we use a simple MLP with an input dimension of 9, 3 hidden layers of size 16 with Re LU activation followed by a 30% dropout layer and a linear layer to 2 outputs (heart disease present or not). We use 358 samples for training and 120 for validation. We train for a maximum of 1000 epochs and select the model with the highest AUC on the validation set, performing early stopping if the validation AUC has not increased in over 100 epochs. [...] We use stochastic gradient descent (SGD) with a base learning rate of 0.1, L2 regularization of 5 10 4, momentum of 0.9, a batch size of 128 and a cosine annealing learning rate schedule with a maximum 200 iterations stepped once per epoch for a total of 200 epochs.