Learning Prediction Intervals for Model Performance

Authors: Benjamin Elder, Matthew Arnold, Anupama Murthi, Jiří Navrátil7305-7313

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach across a wide range of drift conditions and show substantial improvement over competitive baselines.
Researcher Affiliation Industry Benjamin Elder, Matthew Arnold, Anupama Murthi, Jiri Navratil IBM T.J. Watson Research Center benjamin.elder@ibm.com, marnold@us.ibm.com, anupama.murthi@ibm.com, jiri@us.ibm.com
Pseudocode Yes Algorithm 1 Algorithm to create linear-skew drift scenarios
Open Source Code No The paper mentions that further implementation details are provided in the supplementary material, but it does not explicitly state that source code for the methodology described is publicly available, nor does it provide a link to a code repository.
Open Datasets Yes For our experiments we use a set of fifteen publicly available tabular datasets, sourced from Kaggle, Open ML, and Lending Club: Artificial Character, Bach Choral, Bank Marketing, BNG Zoo, BNG Ionosphere, Churn Modeling, Creditcard Default, Forest Cover Type, Higgs Boson, Lending Club (2016 Q1, 2017 Q1), Network Attack, Phishing, Pulsar, SDSS, and Waveform.
Dataset Splits Yes Each dataset was chosen in turn as the target, and its UM was trained on the remaining training datasets. All results are averaged over these fifteen different UMs. ... Randomly split Xtr, Xte Xtt with proportions ptr ptr+pte and pte ptr+pte. ... The UM prediction intervals using an 80%/20% train/test split of the drift scenarios.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or memory) used to run the experiments.
Software Dependencies No The paper mentions software components such as 'Python', 'Gradient Boosting Machine (GBM)', and 'XGBoost', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes In the linear-skew simulations we chose two features per dataset, and performed Alg. 1 for each feature with fifteen values of the sampling ratio R = 0, 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95, 99, 100. This was repeated using five random seeds, giving a total of 300 drift scenarios per dataset.