Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Planning through Automatic Portfolio Configuration: The PbP Approach

Authors: A. Gerevini, A. Saetti, M. Vallati

JAIR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we experimentally analyze Pb P considering planning speed and plan quality in depth. We provide a collection of results that help to understand Pb P s behavior, and demonstrate the eﬀectiveness of our approach to conﬁguring a portfolio of planners with macro-actions.
Researcher Affiliation	Academia	Alfonso Emilio Gerevini EMAIL Alessandro Saetti EMAIL Dipartimento di Ingegneria dell Informazione Universit a degli Studi di Brescia Via Branze 38, I-25123 Brescia, Italy Mauro Vallati EMAIL School of Computing and Engineering University of Huddersﬁeld Huddersﬁeld, West Yorkshire, HD1 3DH, UK
Pseudocode	No	The paper describes algorithms and procedures in prose, such as the round-robin scheduling in Section 3.1 and 3.2.5, but does not present them in structured pseudocode or algorithm blocks with distinct labels like "Algorithm 1" or "Pseudocode for X".
Open Source Code	Yes	The code of the last version of Pb P is available from http://chronus.ing.unibs.it/pbp/.
Open Datasets	Yes	The experiment evaluating Pb P.s/q with respect to the other IPC6-7 planners considers all IPC6-7 benchmark domains (Fern et al., 2011; Jim enez et al., 2011), while the other experiments focus on the most recent IPC7 domains. Regarding the training problems used in the experiments, for the IPC6 domains they are the same as those of IPC6; for the IPC7 domains, they are a set of 540 problems of various sizes (60 problems for each IPC7 domain, unless otherwise speciﬁed for the particular experiment under consideration) that have been generated using the problem generator made available by the organizers of IPC7 (for IPC7, no explicit set of training problems was provided). Regarding the test problems, we used the same problems as those used in IPC6-7: the IPC6 test problems were used for evaluating the performance of Pb P.s/q with respect to the planners that entered IPC6; the IPC7 test problems, that are generally larger and much more diﬃcult than the IPC6 problems, were used for evaluating Pb P.s/q with respect to the IPC7 planners, as well as for all other experiments in our analysis.
Dataset Splits	Yes	Regarding the training problems used in the experiments, for the IPC6 domains they are the same as those of IPC6; for the IPC7 domains, they are a set of 540 problems of various sizes (60 problems for each IPC7 domain, unless otherwise speciﬁed for the particular experiment under consideration) that have been generated using the problem generator made available by the organizers of IPC7 (for IPC7, no explicit set of training problems was provided). The training problems are used for both learning macros and conﬁguring the portfolio. Since the learning procedure of Wizard can run a planner over the training problems several times, in order to make the training not too much time consuming, half of the training problem set was designed to be formed by problems that took up to 30 seconds to solve by some planner; the other half is formed by problems that took up to about 450 seconds (half of the CPU time limit used in the testing phase) to solve. Regarding the test problems, we used the same problems as those used in IPC6-7...
Hardware Specification	Yes	For the comparison with the IPC6 planners, the results of Pb P.s/q were obtained by running its last version on a machine similar to (same CPU frequency and amount of RAM) the one used to obtain the oﬃcial IPC6 data (an Intel Core(tm) Quad-processor Q6600 with 3 Gbytes of RAM). For the comparison of Pb P.s/q and the IPC7 planners, all systems were run using the same machine of IPC7 (a Quad-core Intel Xeon 2.93 GHz with 4 Gbytes of RAM) that the IPC-organizers made available to us for this experiment. Unless otherwise speciﬁed, the other experiments were conducted using a Quad-core Intel Xeon(tm) 3.16 GHz with 4 Gbytes of RAM.
Software Dependencies	Yes	The current implementation of Pb P incorporates eight well-known successful planners, Fast Downward (Helmert, 2006), LAMA (Richter & Westphal, 2010), LPG-td (Gerevini et al., 2006), Macro-FF (Botea et al., 2005, 2007b), Marvin (Coles & Smith, 2007), Metric-FF (Hoﬀmann, 2003), SGPlan5 (Chen et al., 2006), YAHSP (Vidal, 2004) and a recent version of LPG (Par LPG) using a dedicated conﬁguration phase to automatically optimize the setting of a collection of parameters governing the behavior of several parts of the system (Vallati et al., 2013b). Basically, running Par LPG consists in running LPG using a domain-speciﬁc parameter conﬁguration. Every other incorporated planner runs using its default parameter conﬁguration.
Experiment Setup	Yes	Unless otherwise indicated, as in IPC6-7, the CPU-time limit of each run of Pb P.s/q was 15 minutes, Pb P.s/q used the default conﬁguration process (the CPU-time limit for each simulated execution of a planner cluster was 15 minutes), and the planners of the conﬁgured portfolio were run by the round-robin scheduling described in Section 3.2. The performance data of each planner in Pb P.s/q incorporating a randomized algorithm (i.e., LPG, Par LPG and LAMA) were obtained by a single run for each considered problem instance. In the rest of the paper, the sequence of increasing percentages p1, ..., pn used to deﬁne the planning time slots is called the problem coverage percentage vector (PCPV). The default PCPV in Pb P is the sequence 25, 50, 75, 80, 85, 90, 95, 97, 99 (n = 9), which is the same used in the work of Roberts and Howe (2007).