reproducibilityindex.ai

Improving Model Selection by Employing the Test Data

Authors: Max Westphal, Werner Brannath

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our numerical experiments involve training common machine learning algorithms (EN, CART, SVM, XGB) on various artiﬁcial classiﬁcation tasks. At its core, our proposed approach improves model selection in terms of the expected ﬁnal model performance without introducing overoptimism. We furthermore observed a higher probability for a successful evaluation study, making it easier in practice to empirically demonstrate a sufﬁciently high predictive performance.
Researcher Affiliation	Academia	1Institute for Statistics, Faculty 3: Mathematics and Computer Science, University of Bremen, Bremen, Germany. Correspondence to: Max Westphal <mwestphal@uni-bremen.de>.
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	In addition, two newly developed packages were used: SEPM2 (Statistical Evaluation of Prediction Models) provides the selection and statistical inference framework. SEPM.MLE3 provides all functions used to conduct the numerical experiments presented in this work. [Footnotes link to GitHub repositories for SEPM (https://github.com/maxwestphal/SEPM) and SEPM.MLE (https://github.com/maxwestphal/SEPM.MLE)]
Open Datasets	No	The paper describes how the data was generated for the simulation study (e.g., "sampled from a multivariate standard normal distribution") rather than using a pre-existing, publicly available dataset with concrete access information.
Dataset Splits	Yes	The validation data size was set to n V = n L/4 in all cases.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions generic computing contexts without specifications.
Software Dependencies	No	All numerical experiments have been conducted in R (R Core Team, 2013). We used many existing packages, most importantly the batchtools (Lang et al., 2017) package for processing batch jobs and the mvtnorm (Genz et al., 2018) for computations concerning the multivariate normal distribution. For the machine learning part, we employed the caret1 package as a wrapper for methods from glmnet, rpart, Liblinea R, and xgboost. In addition, two newly developed packages were used: SEPM2 (Statistical Evaluation of Prediction Models) provides the selection and statistical inference framework. SEPM.MLE3 provides all functions used to conduct the numerical experiments presented in this work. While software is listed, no specific version numbers are provided for any of the mentioned packages (e.g., batchtools, mvtnorm, caret, glmnet, rpart, LiblineaR, xgboost, SEPM, SEPM.MLE).
Experiment Setup	No	For every data instance, we train M = 200 models with randomly sampled hyperparameters on the training data T. The paper lists the number of hyperparameters for each algorithm (e.g., EN (2), CART (2), SVM (3), XGB (7)), but it does not provide the specific values or ranges of these hyperparameters, nor other training configuration details.