Tracking the risk of a deployed model and detecting harmful distribution shifts
Authors: Aleksandr Podkopaev, Aaditya Ramdas
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of the proposed framework through an extensive empirical study on a collection of simulated and real datasets. |
| Researcher Affiliation | Academia | Department of Statistics & Data Science1 Machine Learning Department2 Carnegie Mellon University {podkopaev,aramdas}@cmu.edu |
| Pseudocode | Yes | Algorithm 1 Sequential testing for an absolute increase in the risk. |
| Open Source Code | Yes | In order to ensure reproducibility of the results in this paper, we include the following to the supplementary materials: (a) relevant source code for all simulations that have been performed |
| Open Datasets | Yes | We focus on two image classification datasets with induced corruptions: MNIST-C (Mu & Gilmer, 2019) and CIFAR-10-C (Krizhevsky, 2009; Hendrycks & Dietterich, 2019). |
| Dataset Splits | Yes | The network is trained on original (clean) MNIST data, which is split split into two folds with 10% of data used for validation purposes. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. It mentions training CNNs and ResNet models but without hardware specifications. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | Details regarding the network architecture and the training process are given in Appendix G.1...We train a shallow CNN with two convolutional layers (each with 3 × 3 kernel matrices), each followed by max-pooling layers. Subsequently, the result is flattened and followed by a dropout layer (p = 0.5), a fully-connected layer with 128 neurons and an output layer. ... The model underlying a set-valued predictor is a standard ResNet-32. It is trained for 50 epochs on the original (clean) CIFAR-10 dataset, without data augmentation, using 10% of data for validation purposes. |