A Bayesian Wilcoxon signed-rank test based on the Dirichlet process

Authors: Alessio Benavoli, Giorgio Corani, Francesca Mangili, Marco Zaffalon, Fabrizio Ruggeri

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show results dealing with the comparison of two classifiers using real and simulated data. By means of simulations on artificial and real world data, we use our test to decide if a certain classifier is significantly better than another.
Researcher Affiliation Academia IPG IDSIA, Manno, Switzerland and CNR IMATI, Milano, Italy
Pseudocode No The paper presents mathematical formulas and theorems but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The IDP test developed in this work can currently be used online (or downloaded as R or Matlab code) at http://ipg.idsia.ch/software/IDP.php.
Open Datasets Yes We run the WEKA implementation (Witten et al., 2011) of such classifiers on 70 data sets from the UCI repository: 54 classification data sets and 16 regression data sets, which we use for classification having discretized into 4 bins the target variable.
Dataset Splits Yes We evaluate via 10 folds cross-validation the accuracy of each classifier on each data set.
Hardware Specification No The paper describes experimental setup involving numerical simulations and the use of the WEKA tool, but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for these experiments.
Software Dependencies No The paper mentions 'WEKA implementation (Witten et al., 2011)' and states the IDP test can be downloaded as 'R or Matlab code,' but it does not specify concrete version numbers for WEKA, R, Matlab, or any other software dependencies.
Experiment Setup Yes Consider a Monte Carlo experiment in which paired values of accuracies Xi, Yi are generated for n = 30 multiple data sets based on the Gaussian models: Xi Yi for i = 1,...,n, with (difference in accuracy) ranging from 0.07 to 0.07 and σ = 0.12. ... The one-sided Wilcoxon test has been implemented according to the conventional decision criterion: p-value less than α = 0.05.