Tracking the risk of a deployed model and detecting harmful distribution shifts

Authors: Aleksandr Podkopaev, Aaditya Ramdas

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of the proposed framework through an extensive empirical study on a collection of simulated and real datasets.
Researcher Affiliation Academia Department of Statistics & Data Science1 Machine Learning Department2 Carnegie Mellon University {podkopaev,aramdas}@cmu.edu
Pseudocode Yes Algorithm 1 Sequential testing for an absolute increase in the risk.
Open Source Code Yes In order to ensure reproducibility of the results in this paper, we include the following to the supplementary materials: (a) relevant source code for all simulations that have been performed
Open Datasets Yes We focus on two image classification datasets with induced corruptions: MNIST-C (Mu & Gilmer, 2019) and CIFAR-10-C (Krizhevsky, 2009; Hendrycks & Dietterich, 2019).
Dataset Splits Yes The network is trained on original (clean) MNIST data, which is split split into two folds with 10% of data used for validation purposes.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. It mentions training CNNs and ResNet models but without hardware specifications.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes Details regarding the network architecture and the training process are given in Appendix G.1...We train a shallow CNN with two convolutional layers (each with 3 × 3 kernel matrices), each followed by max-pooling layers. Subsequently, the result is flattened and followed by a dropout layer (p = 0.5), a fully-connected layer with 128 neurons and an output layer. ... The model underlying a set-valued predictor is a standard ResNet-32. It is trained for 50 epochs on the original (clean) CIFAR-10 dataset, without data augmentation, using 10% of data for validation purposes.