Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Should We Really Use Post-Hoc Tests Based on Mean-Ranks?
Authors: Alessio Benavoli, Giorgio Corani, Francesca Mangili
JMLR 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the inconsistencies the mean-ranks test by presenting three examples. All examples refer to the analysis of the accuracy of different classifiers on multiple data sets. ... Example 3: Real Classifiers on UCI Data Sets. Finally, we compare the accuracies of seven classifiers on 54 datasets. |
| Researcher Affiliation | Academia | Alessio Benavoli EMAIL Giorgio Corani EMAIL Francesca Mangili EMAIL Istituto Dalle Molle di Studi sull Intelligenza Artificiale (IDSIA) Scuola Universitaria Professionale della Svizzera italiana (SUPSI) Universit a della Svizzera italiana (USI) Manno, Switzerland |
| Pseudocode | No | The paper describes mathematical formulas and statistical tests but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The MATLAB scripts of the above examples can be downloaded from ipg.idsia.ch/software/meanRanks/matlab.zip |
| Open Datasets | Yes | Example 3: Real Classifiers on UCI Data Sets. Finally, we compare the accuracies of seven classifiers on 54 datasets. The accuracies are reported in Table 2. |
| Dataset Splits | Yes | Each classifier has been assessed via 10 runs of 10-folds cross-validation. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models used for running experiments. |
| Software Dependencies | No | We performed all the experiments using WEKA.2 ... The MATLAB scripts of the above examples can be downloaded from ipg.idsia.ch/software/meanRanks/matlab.zip. No version numbers are specified for WEKA or MATLAB. |
| Experiment Setup | No | The paper details settings for statistical comparisons (e.g., Bonferroni correction, significance levels, p-values, 10 runs of 10-folds cross-validation for evaluation), but it does not specify hyperparameters (like learning rate, batch size, optimizer) or system-level training settings for the machine learning classifiers themselves. |