Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On the Evaluation of (Meta-)solver Approaches

Authors: Roberto Amadini, Maurizio Gabbrielli, Tong Liu, Jacopo Mauro

JAIR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Starting from some surprising results presented by Liu, Amadini, Mauro, and Gabbrielli (2021) showing dramatic ranking changes with diﬀerent but reasonable metrics, we would like to draw more attention to the evaluation of meta-solver approaches by shedding some light on the strengths and weaknesses of diﬀerent metrics. Unsurprisingly, some of the ﬁndings we report here also apply to the evaluation of individual solvers. [...] Liu et al. compared the performance of six meta-solver approaches across 15 decision-problem scenarios taken from ASlib (Bischl et al., 2016) and coming from heterogeneous domains such as Answer-Set Programming, Constraint Programming, Quantiﬁed Boolean Formula, Boolean Satisﬁability. Tab. 1 reports the performance of meta-solvers ASAP and RF, respectively the best approach according to the closed gap score and the MZNC score. [...] Fig. 2 shows the runtime distributions of the instances solved by ASAP and RF, sorted by ascending runtime.
Researcher Affiliation	Collaboration	Roberto Amadini EMAIL Maurizio Gabbrielli EMAIL Department of Computer Science and Engineering, University of Bologna, Italy. Tong Liu EMAIL Meituan, Beijing, China. Jacopo Mauro EMAIL Department of Mathematics and Computer Science, University of Southern Denmark, Denmark
Pseudocode	No	The paper discusses various evaluation metrics and their application to meta-solvers, analyzing existing results. It does not present new algorithms or procedures in pseudocode format.
Open Source Code	No	The paper analyzes and discusses performance metrics for meta-solvers, drawing upon previously published works and data. It does not state that it provides its own source code for the methodology described.
Open Datasets	Yes	In this work, we will consider the scenarios of the Algorithm Selection library (ASlib) (Bischl et al., 2016), i.e., the reference library for AS scenarios. The ASlib contains several data sets from the literature, and we assume that they represent realistic scenarios.
Dataset Splits	Yes	When evaluating a meta-solver s on scenario (I, S, τ), it is common practice to partition I into a training set Itr, on which s learns how to leverage its individual solvers, and a test set Its where the performance of s on unforeseen problems is measured. In particular, to prevent overﬁtting, it is possible to use a k-fold cross validation by ﬁrst splitting I into k disjoint folds, and then using, in turn, one fold as test set and the union of the other folds as the training set. In the 2015 AS challenge (Lindauer et al., 2019) the submissions were evaluated with a 10-fold cross validation, while in the OASC in 2017 the dataset of the scenarios was divided only into one test set and one training set.
Hardware Specification	No	The paper discusses general 'computational resources available' as a factor influencing evaluation metrics, but it does not specify any particular hardware (GPU/CPU models, memory, or specific computing environments) used to conduct its own analysis or present its findings.
Software Dependencies	No	The paper discusses various solvers and competitions in the context of meta-solver evaluation but does not list any specific software dependencies or their version numbers that were used for the analysis or experiments presented in this paper.
Experiment Setup	No	The paper analyzes performance metrics for meta-solvers and discusses existing results, but it does not describe a new experimental setup with specific hyperparameters, training configurations, or system-level settings for a model developed or trained by the authors.