Statistical Multicriteria Benchmarking via the GSD-Front
Authors: Christoph Jansen, Georg Schollmeyer, Julian Rodemann, Hannah Blocher, Thomas Augustin
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate our concepts on the benchmark suite PMLB and on the platform Open ML. |
| Researcher Affiliation | Academia | Christoph Jansen1, c.jansen@lancaster.ac.uk Georg Schollmeyer2, georg.schollmeyer@stat.uni-muenchen.de Julian Rodemann2, julian@stat.uni-muenchen.de Hannah Blocher2, hannah.blocher@stat.uni-muenchen.de Thomas Augustin2 thomas. Augustin@stat.uni-muenchen.de 1School of Computing & Communications Lancaster University Leipzig Leipzig, Germany 2Department of Statistics Ludwig-Maximilians-Universität München Munich, Germany |
| Pseudocode | No | The paper describes testing schemes with numbered steps but does not include formal pseudocode blocks or sections explicitly labeled 'Algorithm' or 'Pseudocode'. |
| Open Source Code | Yes | 4Implementations of all methods and scripts to reproduce the experiments:https://github.com/ hannahblo/Statistical-Multicriteria-Benchmarking-via-the-GSD-Front. |
| Open Datasets | Yes | We illustrate our concepts on two well-established benchmark suites: Open ML [82, 11] and PMLB [64]. |
| Dataset Splits | Yes | We then tune the six classifiers hyperparameters on a (multivariate) grid of size 10 following [49] for each of the 62 datasets and eventually compute i) to iii) through 10-fold cross validation. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used, such as GPU/CPU models, memory, or specific computing environments. |
| Software Dependencies | No | The paper lists several software libraries and references associated publication years (e.g., 'Package xgboost . [Accessed: 13.05.2024]. 2023.'), but does not provide specific version numbers (e.g., 'xgboost 1.7.0') for the software dependencies used in the experiments. |
| Experiment Setup | Yes | We select 80 binary classification datasets (according to criteria detailed in Appendix C.1) from Open ML [82] to compare the performance of Support Vector Machine (SVM) with Random Forest (RF), Decision Tree (CART), Logistic Regression (LR), Generalized Linear Model with Elastic net (GLMNet), Extreme Gradient Boosting (x GBoost), and k-Nearest Neighbors (k NN). Our multidimensional quality metric is composed of predictive accuracy, computation time on the test data, and computation time on the training data. ... We then tune the six classifiers hyperparameters on a (multivariate) grid of size 10 following [49] for each of the 62 datasets and eventually compute i) to iii) through 10-fold cross validation. |