A Bayesian nonparametric procedure for comparing algorithms
Authors: Alessio Benavoli, Giorgio Corani, Francesca Mangili, Marco Zaffalon
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally compare our procedure (Bayesian Friedman test with joint multiple comparisons) with the well-established F-race. The simulation results are shown in Table 2 for different values of q and ρ. |
| Researcher Affiliation | Academia | IDSIA, Manno, Switzerland |
| Pseudocode | No | The information is insufficient. The paper describes methods and procedures in narrative text and mathematical formulas, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The information is insufficient. The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The information is insufficient. The paper describes synthetic data generation for experiments ("We sample the results of the j-th candidate from a normal with mean µi and variance σ2 i") rather than using a publicly available dataset with concrete access information. |
| Dataset Splits | No | The information is insufficient. The paper describes a simulation setup where data is sampled and algorithms are assessed, but it does not provide specific training, validation, or test dataset splits in terms of percentages, sample counts, or references to predefined splits. |
| Hardware Specification | No | The information is insufficient. The paper does not provide any specific details regarding the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The information is insufficient. The paper does not provide specific version numbers for any software components, libraries, or solvers used in the experiments. |
| Experiment Setup | Yes | The setting are as follows. We perform both the frequentist and the Friedman with significance α=0.05. For the Bayesian multiple comparison, we accept statements of joint comparison whose posterior probability is larger than 0.95. ... We consider q candidates in each race. We sample the results of the j-th candidate from a normal with mean µi and variance σ2 i . Before each race, the means µ1, . . . , µq are uniformly sampled from the interval [0, 1]; the variances σ2 1 = = σ2 q = ρ2. The best algorithm is thus the one with the highest mean. We fix the overall number of maximum allowed assessments to M = 300. For each assessment of an algorithm we decrease M of one unit. ... We perform 200 repetitions for each setting. |