Learning Prediction Intervals for Model Performance
Authors: Benjamin Elder, Matthew Arnold, Anupama Murthi, Jiří Navrátil7305-7313
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach across a wide range of drift conditions and show substantial improvement over competitive baselines. |
| Researcher Affiliation | Industry | Benjamin Elder, Matthew Arnold, Anupama Murthi, Jiri Navratil IBM T.J. Watson Research Center benjamin.elder@ibm.com, marnold@us.ibm.com, anupama.murthi@ibm.com, jiri@us.ibm.com |
| Pseudocode | Yes | Algorithm 1 Algorithm to create linear-skew drift scenarios |
| Open Source Code | No | The paper mentions that further implementation details are provided in the supplementary material, but it does not explicitly state that source code for the methodology described is publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | For our experiments we use a set of fifteen publicly available tabular datasets, sourced from Kaggle, Open ML, and Lending Club: Artificial Character, Bach Choral, Bank Marketing, BNG Zoo, BNG Ionosphere, Churn Modeling, Creditcard Default, Forest Cover Type, Higgs Boson, Lending Club (2016 Q1, 2017 Q1), Network Attack, Phishing, Pulsar, SDSS, and Waveform. |
| Dataset Splits | Yes | Each dataset was chosen in turn as the target, and its UM was trained on the remaining training datasets. All results are averaged over these fifteen different UMs. ... Randomly split Xtr, Xte Xtt with proportions ptr ptr+pte and pte ptr+pte. ... The UM prediction intervals using an 80%/20% train/test split of the drift scenarios. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions software components such as 'Python', 'Gradient Boosting Machine (GBM)', and 'XGBoost', but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | In the linear-skew simulations we chose two features per dataset, and performed Alg. 1 for each feature with fifteen values of the sampling ratio R = 0, 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95, 99, 100. This was repeated using five random seeds, giving a total of 300 drift scenarios per dataset. |