Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Statistical Comparisons of Classifiers by Generalized Stochastic Dominance

Authors: Christoph Jansen, Malte Nalenz, Georg Schollmeyer, Thomas Augustin

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate and investigate our framework in a simulation study and with a set of standard benchmark data sets. ... 5. A Simulation Study ... 6. Experiments with UCI Data Sets
Researcher Affiliation	Academia	Christoph Jansen EMAIL Department of Statistics Ludwig-Maximilians-Universit at Ludwigstr. 33, 80539 Munich, Germany Malte Nalenz EMAIL Department of Statistics Ludwig-Maximilians-Universit at Ludwigstr. 33, 80539 Munich, Germany Georg Schollmeyer EMAIL Department of Statistics Ludwig-Maximilians-Universit at Ludwigstr. 33, 80539 Munich, Germany Thomas Augustin EMAIL Department of Statistics Ludwig-Maximilians-Universit at Ludwigstr. 33, 80539 Munich, Germany
Pseudocode	No	The paper describes a "concrete procedure for evaluating the distribution of optij has the following ﬁve steps" in Section 4.2. While it lists steps, it is presented in paragraph form rather than a clearly labeled pseudocode block or algorithm environment.
Open Source Code	No	The paper mentions 'for an implementation of the framework, see Calvo and Santaf e (2016)' which refers to an implementation by other authors, not the source code for the methodology described in this paper by the current authors. There is no explicit statement or link provided for the authors' own code.
Open Datasets	Yes	All data sets are taken from the UCI machine learning repository (Dua and Graﬀ, 2017).
Dataset Splits	Yes	On each data set, 10-fold cross-validation is performed, and results are averaged for each criterion and classiﬁer separately.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions several R packages and their corresponding citations: glmnet (Friedman et al., 2010), gbm (Greenwell et al., 2020), randomForest (Liaw and Wiener, 2002), and rpart (Therneau and Atkinson, 2019). It also cites 'R Core Team, 2021' for R itself. However, it does not provide specific version numbers for these R packages or the R environment itself (e.g., R version 4.x.x, glmnet version X.Y).
Experiment Setup	Yes	The optimal λ is determined via cross-validation. The mixing parameter in Elastic Net is set to 0.5. GBM and Gradient boosted decision stumps are ﬁt using the gbm R-package (Greenwell et al., 2020). Gradient boosting uses 300 trees with a learning rate of 0.02 and a maximum depth of 3. The stumps use 500 trees and a learning rate of 0.05. Random Forest is ﬁt using the randomForest R-package (Liaw and Wiener, 2002) with default settings. For CART we use the rpart R-package (Therneau and Atkinson, 2019) with default settings.