reproducibilityindex.ai

Statistical Multicriteria Benchmarking via the GSD-Front

Authors: Christoph Jansen, Georg Schollmeyer, Julian Rodemann, Hannah Blocher, Thomas Augustin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate our concepts on the benchmark suite PMLB and on the platform Open ML.
Researcher Affiliation	Academia	Christoph Jansen1, c.jansen@lancaster.ac.uk Georg Schollmeyer2, georg.schollmeyer@stat.uni-muenchen.de Julian Rodemann2, julian@stat.uni-muenchen.de Hannah Blocher2, hannah.blocher@stat.uni-muenchen.de Thomas Augustin2 thomas. Augustin@stat.uni-muenchen.de 1School of Computing & Communications Lancaster University Leipzig Leipzig, Germany 2Department of Statistics Ludwig-Maximilians-Universität München Munich, Germany
Pseudocode	No	The paper describes testing schemes with numbered steps but does not include formal pseudocode blocks or sections explicitly labeled 'Algorithm' or 'Pseudocode'.
Open Source Code	Yes	4Implementations of all methods and scripts to reproduce the experiments:https://github.com/ hannahblo/Statistical-Multicriteria-Benchmarking-via-the-GSD-Front.
Open Datasets	Yes	We illustrate our concepts on two well-established benchmark suites: Open ML [82, 11] and PMLB [64].
Dataset Splits	Yes	We then tune the six classifiers hyperparameters on a (multivariate) grid of size 10 following [49] for each of the 62 datasets and eventually compute i) to iii) through 10-fold cross validation.
Hardware Specification	No	The paper does not provide specific details on the hardware used, such as GPU/CPU models, memory, or specific computing environments.
Software Dependencies	No	The paper lists several software libraries and references associated publication years (e.g., 'Package xgboost . [Accessed: 13.05.2024]. 2023.'), but does not provide specific version numbers (e.g., 'xgboost 1.7.0') for the software dependencies used in the experiments.
Experiment Setup	Yes	We select 80 binary classification datasets (according to criteria detailed in Appendix C.1) from Open ML [82] to compare the performance of Support Vector Machine (SVM) with Random Forest (RF), Decision Tree (CART), Logistic Regression (LR), Generalized Linear Model with Elastic net (GLMNet), Extreme Gradient Boosting (x GBoost), and k-Nearest Neighbors (k NN). Our multidimensional quality metric is composed of predictive accuracy, computation time on the test data, and computation time on the training data. ... We then tune the six classifiers hyperparameters on a (multivariate) grid of size 10 following [49] for each of the 62 datasets and eventually compute i) to iii) through 10-fold cross validation.