reproducibilityindex.ai

Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift

Authors: Stephan Rabanser, Stephan Günnemann, Zachary Lipton

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper explores the problem of building ML systems that fail loudly, investigating methods for detecting dataset shift, identifying exemplars that most typify the shift, and quantifying shift malignancy. We focus on several datasets and various perturbations to both covariates and label distributions with varying magnitudes and fractions of data affected.
Researcher Affiliation	Collaboration	Stephan Rabanser AWS AI Labs rabans@amazon.com Stephan G unnemann Technical University of Munich guennemann@in.tum.de Zachary C. Lipton Carnegie Mellon University zlipton@cmu.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We provide a sample implementation of our experiments-pipeline written in Python, making use of sklearn [36] and Keras [11], located at: https://github.com/steverab/failing-loudly.
Open Datasets	Yes	Our main experiments were carried out on the MNIST (Ntr = 50000; Nval = 10000; Nte = 10000; D = 28 28 1; C = 10 classes) [25] and CIFAR-10 (Ntr = 40000; Nval = 10000; Nte = 10000; D = 32 32 3; C = 10 classes) [23] image datasets.
Dataset Splits	Yes	Our main experiments were carried out on the MNIST (Ntr = 50000; Nval = 10000; Nte = 10000; D = 28 28 1; C = 10 classes) [25] and CIFAR-10 (Ntr = 40000; Nval = 10000; Nte = 10000; D = 32 32 3; C = 10 classes) [23] image datasets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory amounts) used for running the experiments. It only mentions the training environment implicitly by using libraries like Keras.
Software Dependencies	No	The paper mentions "Python, making use of sklearn [36] and Keras [11]" but does not specify version numbers for these software components, which is required for reproducibility.
Experiment Setup	Yes	We train all networks (TAE, BBSDs, BBSDh, Classif) using stochastic gradient descent with momentum in batches of 128 examples over 200 epochs with early stopping.