Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift
Authors: Stephan Rabanser, Stephan Günnemann, Zachary Lipton
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper explores the problem of building ML systems that fail loudly, investigating methods for detecting dataset shift, identifying exemplars that most typify the shift, and quantifying shift malignancy. We focus on several datasets and various perturbations to both covariates and label distributions with varying magnitudes and fractions of data affected. |
| Researcher Affiliation | Collaboration | Stephan Rabanser AWS AI Labs rabans@amazon.com Stephan G unnemann Technical University of Munich guennemann@in.tum.de Zachary C. Lipton Carnegie Mellon University zlipton@cmu.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide a sample implementation of our experiments-pipeline written in Python, making use of sklearn [36] and Keras [11], located at: https://github.com/steverab/failing-loudly. |
| Open Datasets | Yes | Our main experiments were carried out on the MNIST (Ntr = 50000; Nval = 10000; Nte = 10000; D = 28 28 1; C = 10 classes) [25] and CIFAR-10 (Ntr = 40000; Nval = 10000; Nte = 10000; D = 32 32 3; C = 10 classes) [23] image datasets. |
| Dataset Splits | Yes | Our main experiments were carried out on the MNIST (Ntr = 50000; Nval = 10000; Nte = 10000; D = 28 28 1; C = 10 classes) [25] and CIFAR-10 (Ntr = 40000; Nval = 10000; Nte = 10000; D = 32 32 3; C = 10 classes) [23] image datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory amounts) used for running the experiments. It only mentions the training environment implicitly by using libraries like Keras. |
| Software Dependencies | No | The paper mentions "Python, making use of sklearn [36] and Keras [11]" but does not specify version numbers for these software components, which is required for reproducibility. |
| Experiment Setup | Yes | We train all networks (TAE, BBSDs, BBSDh, Classif) using stochastic gradient descent with momentum in batches of 128 examples over 200 epochs with early stopping. |