reproducibilityindex.ai

Learning Prediction Intervals for Model Performance

Authors: Benjamin Elder, Matthew Arnold, Anupama Murthi, Jiří Navrátil7305-7313

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach across a wide range of drift conditions and show substantial improvement over competitive baselines.
Researcher Affiliation	Industry	Benjamin Elder, Matthew Arnold, Anupama Murthi, Jiri Navratil IBM T.J. Watson Research Center benjamin.elder@ibm.com, marnold@us.ibm.com, anupama.murthi@ibm.com, jiri@us.ibm.com
Pseudocode	Yes	Algorithm 1 Algorithm to create linear-skew drift scenarios
Open Source Code	No	The paper mentions that further implementation details are provided in the supplementary material, but it does not explicitly state that source code for the methodology described is publicly available, nor does it provide a link to a code repository.
Open Datasets	Yes	For our experiments we use a set of ﬁfteen publicly available tabular datasets, sourced from Kaggle, Open ML, and Lending Club: Artiﬁcial Character, Bach Choral, Bank Marketing, BNG Zoo, BNG Ionosphere, Churn Modeling, Creditcard Default, Forest Cover Type, Higgs Boson, Lending Club (2016 Q1, 2017 Q1), Network Attack, Phishing, Pulsar, SDSS, and Waveform.
Dataset Splits	Yes	Each dataset was chosen in turn as the target, and its UM was trained on the remaining training datasets. All results are averaged over these ﬁfteen different UMs. ... Randomly split Xtr, Xte Xtt with proportions ptr ptr+pte and pte ptr+pte. ... The UM prediction intervals using an 80%/20% train/test split of the drift scenarios.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or memory) used to run the experiments.
Software Dependencies	No	The paper mentions software components such as 'Python', 'Gradient Boosting Machine (GBM)', and 'XGBoost', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	In the linear-skew simulations we chose two features per dataset, and performed Alg. 1 for each feature with ﬁfteen values of the sampling ratio R = 0, 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95, 99, 100. This was repeated using ﬁve random seeds, giving a total of 300 drift scenarios per dataset.