Sequential Harmful Shift Detection Without Labels
Authors: Salim I. Amoukou, Tom Bewley, Saumitra Mishra, Freddy Lecue, Daniele Magazzeni, Manuela Veloso
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our method has high power and false alarm control under various distribution shifts, including covariate and label shifts and natural shifts over geography and time.Section 5 demonstrates the empirical efficacy of our method, showcasing its strong detection capabilities and controlled false alarm rates across various types of harmful shift. |
| Researcher Affiliation | Industry | Salim I. Amoukou Tom Bewley Saumitra Mishra Freddy Lecue Daniele Magazzeni Manuela Veloso J.P. Morgan AI ResearchCorrespondence to: Salim I. Amoukou <salim.ibrahimamoukou@jpmorgan.com> |
| Pseudocode | No | The paper describes its methods using prose and mathematical equations but does not include any structured pseudocode blocks or algorithms. |
| Open Source Code | No | We will also release the code with a proper readme to use the methods. |
| Open Datasets | Yes | using the California house prices [Dua and Graff, 2017], Bike sharing demand [Fanaee-T, 2013], HELOC [FICO, 2018] and Nhanesi [CDC, 1999-2022] datasets.We partition each dataset into training (60%), test (20%) and calibration (20%) sets and use the training data to train random forests (RFs) as the primary models. |
| Dataset Splits | Yes | We partition each dataset into training (60%), test (20%) and calibration (20%) sets and use the training data to train random forests (RFs) as the primary models.We split this dataset into a training set (60%), test set (20%) and calibration set (20%), and train a Res Net50 on the training set. Using half of the calibration set, we train another Res Net50 (with a regression head) as an error estimator. The remaining half is employed to determine the empirical quantiles p [0.5, 1), ˆp (0, 1) at which we achieve maximum power while keeping the FDP below 0.2. |
| Hardware Specification | Yes | We run all our experiments on an Amazon EC2 instance (c5.4xlarge) that consists of 16 v CPUs and 32 GB of RAM. |
| Software Dependencies | No | The paper mentions software components like 'random forests (RFs)' and 'Res Net50 model' but does not provide specific version numbers for any libraries or frameworks (e.g., scikit-learn version, PyTorch/TensorFlow version). |
| Experiment Setup | Yes | For continuous features, we exclude 80% of observations with values either above or below the median. For categorical features, we exclude data from one category.We use half of the calibration sets to train RF regressors as the error estimators, then use the remainder to calibrate true and estimated error thresholds using the grid search process described above.We consider a shift to be harmful if the model s error in production exceeds the error on the calibration dataset plus ϵtol = 0. |