Data-SUITE: Data-centric identification of in-distribution incongruous examples

Authors: Nabeel Seedat, Jonathan Crabbé, Mihaela van der Schaar

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate Data-SUITE s performance and coverage guarantees and demonstrate on cross-site medical data, biased data, and data with concept drift, that Data-SUITE best identifies ID regions where a downstream model may be reliable (independent of said model).
Researcher Affiliation Academia 1Department of Applied Mathematics and Theoretical Physics, University of Cambridge, UK 2The Alan Turing Institute, London, UK 3University of California, Los Angeles, USA.
Pseudocode Yes Algorithm 1 General Inductive Conformal prediction
Open Source Code Yes 2https://github.com/seedatnabeel/Data-SUITE 3https://github.com/vanderschaarlab/mlforhealthlabpub/tree/main/alg/Data-SUITE
Open Datasets Yes SEER Dataset The SEER dataset consists of 240,486 patients enrolled in the American SEER program (Duggan et al., 2016). CUTRACT Dataset The CUTRACT dataset is a private dataset consisting of 10,086 patients enrolled in the British Prostate Cancer UK program (Prostate Cancer UK). ADULT Dataset The ADULT dataset (Asuncion & Newman, 2007) has 32,561 instances... ELECTRICITY Dataset. The Electricity dataset (Harries & Wales, 1999), represents energy pricing in Australia...
Dataset Splits Yes We train a downstream regression model using Dsynth train , where features X1, X2 are used to predict X3. We first compute a baseline mean squared error (MSE) on a held-out validation set of Dsynth train and the complete test set Dsynth test ( ˆX = X + Z). Practically, we split the training set (|D+ train| = n) into two disjoint sets, namely the proper training set and calibration set: D+ train = D+ train2 D+ cal, where |D+ train2| = m and (|D+ cal| = n m).
Hardware Specification Yes All experiments were run on CPU on a Mac Book Pro with an Intel Core i5 and 16GB RAM.
Software Dependencies No The paper mentions software like "Scikit-learn" and refers to implementations from "IBM/UQ360" but does not provide specific version numbers for these libraries as used in their experiments. It briefly mentions "Python 3.8, PyTorch 1.9, and CUDA 11.1" in the context of describing BNNs in general, not explicitly stating these are the versions *they* used for *their* experiments.
Experiment Setup Yes We train a 5-layer MLP model. (BNN) We have 5 models in the ensemble, all randomly initialized. Each model is a 3-layer MLP, which we train for 10 epochs. The learning rate was empirically determined based on a validation set. (ENS) We train a 3-layer MLP with dropout (p = 0.1)for 10 epochs. The learning rate was empirically determined based on a validation set. (MCD) The base model for predicting the quantiles is a Gradient Boosting Regressor, with 10 estimators, max depth=5,minimum samples per leaf=5, minimum samples for split=10. (QR) We ultimately selected a Decision Tree to serve as the base model for all n = |d X| feature-wise regessors. We use the following parameters max depth=None, min samples split=2, min samples leaf=5. (Data-SUITE)