reproducibilityindex.ai

Tracking the risk of a deployed model and detecting harmful distribution shifts

Authors: Aleksandr Podkopaev, Aaditya Ramdas

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the efﬁcacy of the proposed framework through an extensive empirical study on a collection of simulated and real datasets.
Researcher Affiliation	Academia	Department of Statistics & Data Science1 Machine Learning Department2 Carnegie Mellon University {podkopaev,aramdas}@cmu.edu
Pseudocode	Yes	Algorithm 1 Sequential testing for an absolute increase in the risk.
Open Source Code	Yes	In order to ensure reproducibility of the results in this paper, we include the following to the supplementary materials: (a) relevant source code for all simulations that have been performed
Open Datasets	Yes	We focus on two image classiﬁcation datasets with induced corruptions: MNIST-C (Mu & Gilmer, 2019) and CIFAR-10-C (Krizhevsky, 2009; Hendrycks & Dietterich, 2019).
Dataset Splits	Yes	The network is trained on original (clean) MNIST data, which is split split into two folds with 10% of data used for validation purposes.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. It mentions training CNNs and ResNet models but without hardware specifications.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup	Yes	Details regarding the network architecture and the training process are given in Appendix G.1...We train a shallow CNN with two convolutional layers (each with 3 × 3 kernel matrices), each followed by max-pooling layers. Subsequently, the result is flattened and followed by a dropout layer (p = 0.5), a fully-connected layer with 128 neurons and an output layer. ... The model underlying a set-valued predictor is a standard ResNet-32. It is trained for 50 epochs on the original (clean) CIFAR-10 dataset, without data augmentation, using 10% of data for validation purposes.