A Bayesian Wilcoxon signed-rank test based on the Dirichlet process
Authors: Alessio Benavoli, Giorgio Corani, Francesca Mangili, Marco Zaffalon, Fabrizio Ruggeri
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show results dealing with the comparison of two classifiers using real and simulated data. By means of simulations on artificial and real world data, we use our test to decide if a certain classifier is significantly better than another. |
| Researcher Affiliation | Academia | IPG IDSIA, Manno, Switzerland and CNR IMATI, Milano, Italy |
| Pseudocode | No | The paper presents mathematical formulas and theorems but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The IDP test developed in this work can currently be used online (or downloaded as R or Matlab code) at http://ipg.idsia.ch/software/IDP.php. |
| Open Datasets | Yes | We run the WEKA implementation (Witten et al., 2011) of such classifiers on 70 data sets from the UCI repository: 54 classification data sets and 16 regression data sets, which we use for classification having discretized into 4 bins the target variable. |
| Dataset Splits | Yes | We evaluate via 10 folds cross-validation the accuracy of each classifier on each data set. |
| Hardware Specification | No | The paper describes experimental setup involving numerical simulations and the use of the WEKA tool, but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for these experiments. |
| Software Dependencies | No | The paper mentions 'WEKA implementation (Witten et al., 2011)' and states the IDP test can be downloaded as 'R or Matlab code,' but it does not specify concrete version numbers for WEKA, R, Matlab, or any other software dependencies. |
| Experiment Setup | Yes | Consider a Monte Carlo experiment in which paired values of accuracies Xi, Yi are generated for n = 30 multiple data sets based on the Gaussian models: Xi Yi for i = 1,...,n, with (difference in accuracy) ranging from 0.07 to 0.07 and σ = 0.12. ... The one-sided Wilcoxon test has been implemented according to the conventional decision criterion: p-value less than α = 0.05. |