Improving Model Selection by Employing the Test Data

Authors: Max Westphal, Werner Brannath

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our numerical experiments involve training common machine learning algorithms (EN, CART, SVM, XGB) on various artificial classification tasks. At its core, our proposed approach improves model selection in terms of the expected final model performance without introducing overoptimism. We furthermore observed a higher probability for a successful evaluation study, making it easier in practice to empirically demonstrate a sufficiently high predictive performance.
Researcher Affiliation Academia 1Institute for Statistics, Faculty 3: Mathematics and Computer Science, University of Bremen, Bremen, Germany. Correspondence to: Max Westphal <mwestphal@uni-bremen.de>.
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes In addition, two newly developed packages were used: SEPM2 (Statistical Evaluation of Prediction Models) provides the selection and statistical inference framework. SEPM.MLE3 provides all functions used to conduct the numerical experiments presented in this work. [Footnotes link to GitHub repositories for SEPM (https://github.com/maxwestphal/SEPM) and SEPM.MLE (https://github.com/maxwestphal/SEPM.MLE)]
Open Datasets No The paper describes how the data was generated for the simulation study (e.g., "sampled from a multivariate standard normal distribution") rather than using a pre-existing, publicly available dataset with concrete access information.
Dataset Splits Yes The validation data size was set to n V = n L/4 in all cases.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions generic computing contexts without specifications.
Software Dependencies No All numerical experiments have been conducted in R (R Core Team, 2013). We used many existing packages, most importantly the batchtools (Lang et al., 2017) package for processing batch jobs and the mvtnorm (Genz et al., 2018) for computations concerning the multivariate normal distribution. For the machine learning part, we employed the caret1 package as a wrapper for methods from glmnet, rpart, Liblinea R, and xgboost. In addition, two newly developed packages were used: SEPM2 (Statistical Evaluation of Prediction Models) provides the selection and statistical inference framework. SEPM.MLE3 provides all functions used to conduct the numerical experiments presented in this work. While software is listed, no specific version numbers are provided for any of the mentioned packages (e.g., batchtools, mvtnorm, caret, glmnet, rpart, LiblineaR, xgboost, SEPM, SEPM.MLE).
Experiment Setup No For every data instance, we train M = 200 models with randomly sampled hyperparameters on the training data T. The paper lists the number of hyperparameters for each algorithm (e.g., EN (2), CART (2), SVM (3), XGB (7)), but it does not provide the specific values or ranges of these hyperparameters, nor other training configuration details.