reproducibilityindex.ai

A Bayesian nonparametric procedure for comparing algorithms

Authors: Alessio Benavoli, Giorgio Corani, Francesca Mangili, Marco Zaffalon

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally compare our procedure (Bayesian Friedman test with joint multiple comparisons) with the well-established F-race. The simulation results are shown in Table 2 for different values of q and ρ.
Researcher Affiliation	Academia	IDSIA, Manno, Switzerland
Pseudocode	No	The information is insufficient. The paper describes methods and procedures in narrative text and mathematical formulas, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The information is insufficient. The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The information is insufficient. The paper describes synthetic data generation for experiments ("We sample the results of the j-th candidate from a normal with mean µi and variance σ2 i") rather than using a publicly available dataset with concrete access information.
Dataset Splits	No	The information is insufficient. The paper describes a simulation setup where data is sampled and algorithms are assessed, but it does not provide specific training, validation, or test dataset splits in terms of percentages, sample counts, or references to predefined splits.
Hardware Specification	No	The information is insufficient. The paper does not provide any specific details regarding the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The information is insufficient. The paper does not provide specific version numbers for any software components, libraries, or solvers used in the experiments.
Experiment Setup	Yes	The setting are as follows. We perform both the frequentist and the Friedman with signiﬁcance α=0.05. For the Bayesian multiple comparison, we accept statements of joint comparison whose posterior probability is larger than 0.95. ... We consider q candidates in each race. We sample the results of the j-th candidate from a normal with mean µi and variance σ2 i . Before each race, the means µ1, . . . , µq are uniformly sampled from the interval [0, 1]; the variances σ2 1 = = σ2 q = ρ2. The best algorithm is thus the one with the highest mean. We ﬁx the overall number of maximum allowed assessments to M = 300. For each assessment of an algorithm we decrease M of one unit. ... We perform 200 repetitions for each setting.