Sequential Harmful Shift Detection Without Labels

Authors: Salim I. Amoukou, Tom Bewley, Saumitra Mishra, Freddy Lecue, Daniele Magazzeni, Manuela Veloso

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our method has high power and false alarm control under various distribution shifts, including covariate and label shifts and natural shifts over geography and time.Section 5 demonstrates the empirical efficacy of our method, showcasing its strong detection capabilities and controlled false alarm rates across various types of harmful shift.
Researcher Affiliation Industry Salim I. Amoukou Tom Bewley Saumitra Mishra Freddy Lecue Daniele Magazzeni Manuela Veloso J.P. Morgan AI ResearchCorrespondence to: Salim I. Amoukou <salim.ibrahimamoukou@jpmorgan.com>
Pseudocode No The paper describes its methods using prose and mathematical equations but does not include any structured pseudocode blocks or algorithms.
Open Source Code No We will also release the code with a proper readme to use the methods.
Open Datasets Yes using the California house prices [Dua and Graff, 2017], Bike sharing demand [Fanaee-T, 2013], HELOC [FICO, 2018] and Nhanesi [CDC, 1999-2022] datasets.We partition each dataset into training (60%), test (20%) and calibration (20%) sets and use the training data to train random forests (RFs) as the primary models.
Dataset Splits Yes We partition each dataset into training (60%), test (20%) and calibration (20%) sets and use the training data to train random forests (RFs) as the primary models.We split this dataset into a training set (60%), test set (20%) and calibration set (20%), and train a Res Net50 on the training set. Using half of the calibration set, we train another Res Net50 (with a regression head) as an error estimator. The remaining half is employed to determine the empirical quantiles p [0.5, 1), ˆp (0, 1) at which we achieve maximum power while keeping the FDP below 0.2.
Hardware Specification Yes We run all our experiments on an Amazon EC2 instance (c5.4xlarge) that consists of 16 v CPUs and 32 GB of RAM.
Software Dependencies No The paper mentions software components like 'random forests (RFs)' and 'Res Net50 model' but does not provide specific version numbers for any libraries or frameworks (e.g., scikit-learn version, PyTorch/TensorFlow version).
Experiment Setup Yes For continuous features, we exclude 80% of observations with values either above or below the median. For categorical features, we exclude data from one category.We use half of the calibration sets to train RF regressors as the error estimators, then use the remainder to calibrate true and estimated error thresholds using the grid search process described above.We consider a shift to be harmful if the model s error in production exceeds the error on the calibration dataset plus ϵtol = 0.