A Bayesian nonparametric procedure for comparing algorithms

Authors: Alessio Benavoli, Giorgio Corani, Francesca Mangili, Marco Zaffalon

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally compare our procedure (Bayesian Friedman test with joint multiple comparisons) with the well-established F-race. The simulation results are shown in Table 2 for different values of q and ρ.
Researcher Affiliation Academia IDSIA, Manno, Switzerland
Pseudocode No The information is insufficient. The paper describes methods and procedures in narrative text and mathematical formulas, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The information is insufficient. The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The information is insufficient. The paper describes synthetic data generation for experiments ("We sample the results of the j-th candidate from a normal with mean µi and variance σ2 i") rather than using a publicly available dataset with concrete access information.
Dataset Splits No The information is insufficient. The paper describes a simulation setup where data is sampled and algorithms are assessed, but it does not provide specific training, validation, or test dataset splits in terms of percentages, sample counts, or references to predefined splits.
Hardware Specification No The information is insufficient. The paper does not provide any specific details regarding the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The information is insufficient. The paper does not provide specific version numbers for any software components, libraries, or solvers used in the experiments.
Experiment Setup Yes The setting are as follows. We perform both the frequentist and the Friedman with significance α=0.05. For the Bayesian multiple comparison, we accept statements of joint comparison whose posterior probability is larger than 0.95. ... We consider q candidates in each race. We sample the results of the j-th candidate from a normal with mean µi and variance σ2 i . Before each race, the means µ1, . . . , µq are uniformly sampled from the interval [0, 1]; the variances σ2 1 = = σ2 q = ρ2. The best algorithm is thus the one with the highest mean. We fix the overall number of maximum allowed assessments to M = 300. For each assessment of an algorithm we decrease M of one unit. ... We perform 200 repetitions for each setting.