Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Evaluation of (Meta-)solver Approaches

Authors: Roberto Amadini, Maurizio Gabbrielli, Tong Liu, Jacopo Mauro

JAIR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Starting from some surprising results presented by Liu, Amadini, Mauro, and Gabbrielli (2021) showing dramatic ranking changes with different but reasonable metrics, we would like to draw more attention to the evaluation of meta-solver approaches by shedding some light on the strengths and weaknesses of different metrics. Unsurprisingly, some of the findings we report here also apply to the evaluation of individual solvers. [...] Liu et al. compared the performance of six meta-solver approaches across 15 decision-problem scenarios taken from ASlib (Bischl et al., 2016) and coming from heterogeneous domains such as Answer-Set Programming, Constraint Programming, Quantified Boolean Formula, Boolean Satisfiability. Tab. 1 reports the performance of meta-solvers ASAP and RF, respectively the best approach according to the closed gap score and the MZNC score. [...] Fig. 2 shows the runtime distributions of the instances solved by ASAP and RF, sorted by ascending runtime.
Researcher Affiliation Collaboration Roberto Amadini EMAIL Maurizio Gabbrielli EMAIL Department of Computer Science and Engineering, University of Bologna, Italy. Tong Liu EMAIL Meituan, Beijing, China. Jacopo Mauro EMAIL Department of Mathematics and Computer Science, University of Southern Denmark, Denmark
Pseudocode No The paper discusses various evaluation metrics and their application to meta-solvers, analyzing existing results. It does not present new algorithms or procedures in pseudocode format.
Open Source Code No The paper analyzes and discusses performance metrics for meta-solvers, drawing upon previously published works and data. It does not state that it provides its own source code for the methodology described.
Open Datasets Yes In this work, we will consider the scenarios of the Algorithm Selection library (ASlib) (Bischl et al., 2016), i.e., the reference library for AS scenarios. The ASlib contains several data sets from the literature, and we assume that they represent realistic scenarios.
Dataset Splits Yes When evaluating a meta-solver s on scenario (I, S, τ), it is common practice to partition I into a training set Itr, on which s learns how to leverage its individual solvers, and a test set Its where the performance of s on unforeseen problems is measured. In particular, to prevent overfitting, it is possible to use a k-fold cross validation by first splitting I into k disjoint folds, and then using, in turn, one fold as test set and the union of the other folds as the training set. In the 2015 AS challenge (Lindauer et al., 2019) the submissions were evaluated with a 10-fold cross validation, while in the OASC in 2017 the dataset of the scenarios was divided only into one test set and one training set.
Hardware Specification No The paper discusses general 'computational resources available' as a factor influencing evaluation metrics, but it does not specify any particular hardware (GPU/CPU models, memory, or specific computing environments) used to conduct its own analysis or present its findings.
Software Dependencies No The paper discusses various solvers and competitions in the context of meta-solver evaluation but does not list any specific software dependencies or their version numbers that were used for the analysis or experiments presented in this paper.
Experiment Setup No The paper analyzes performance metrics for meta-solvers and discusses existing results, but it does not describe a new experimental setup with specific hyperparameters, training configurations, or system-level settings for a model developed or trained by the authors.