Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Planning through Automatic Portfolio Configuration: The PbP Approach
Authors: A. Gerevini, A. Saetti, M. Vallati
JAIR 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we experimentally analyze Pb P considering planning speed and plan quality in depth. We provide a collection of results that help to understand Pb P s behavior, and demonstrate the effectiveness of our approach to configuring a portfolio of planners with macro-actions. |
| Researcher Affiliation | Academia | Alfonso Emilio Gerevini EMAIL Alessandro Saetti EMAIL Dipartimento di Ingegneria dell Informazione Universit a degli Studi di Brescia Via Branze 38, I-25123 Brescia, Italy Mauro Vallati EMAIL School of Computing and Engineering University of Huddersfield Huddersfield, West Yorkshire, HD1 3DH, UK |
| Pseudocode | No | The paper describes algorithms and procedures in prose, such as the round-robin scheduling in Section 3.1 and 3.2.5, but does not present them in structured pseudocode or algorithm blocks with distinct labels like "Algorithm 1" or "Pseudocode for X". |
| Open Source Code | Yes | The code of the last version of Pb P is available from http://chronus.ing.unibs.it/pbp/. |
| Open Datasets | Yes | The experiment evaluating Pb P.s/q with respect to the other IPC6-7 planners considers all IPC6-7 benchmark domains (Fern et al., 2011; Jim enez et al., 2011), while the other experiments focus on the most recent IPC7 domains. Regarding the training problems used in the experiments, for the IPC6 domains they are the same as those of IPC6; for the IPC7 domains, they are a set of 540 problems of various sizes (60 problems for each IPC7 domain, unless otherwise specified for the particular experiment under consideration) that have been generated using the problem generator made available by the organizers of IPC7 (for IPC7, no explicit set of training problems was provided). Regarding the test problems, we used the same problems as those used in IPC6-7: the IPC6 test problems were used for evaluating the performance of Pb P.s/q with respect to the planners that entered IPC6; the IPC7 test problems, that are generally larger and much more difficult than the IPC6 problems, were used for evaluating Pb P.s/q with respect to the IPC7 planners, as well as for all other experiments in our analysis. |
| Dataset Splits | Yes | Regarding the training problems used in the experiments, for the IPC6 domains they are the same as those of IPC6; for the IPC7 domains, they are a set of 540 problems of various sizes (60 problems for each IPC7 domain, unless otherwise specified for the particular experiment under consideration) that have been generated using the problem generator made available by the organizers of IPC7 (for IPC7, no explicit set of training problems was provided). The training problems are used for both learning macros and configuring the portfolio. Since the learning procedure of Wizard can run a planner over the training problems several times, in order to make the training not too much time consuming, half of the training problem set was designed to be formed by problems that took up to 30 seconds to solve by some planner; the other half is formed by problems that took up to about 450 seconds (half of the CPU time limit used in the testing phase) to solve. Regarding the test problems, we used the same problems as those used in IPC6-7... |
| Hardware Specification | Yes | For the comparison with the IPC6 planners, the results of Pb P.s/q were obtained by running its last version on a machine similar to (same CPU frequency and amount of RAM) the one used to obtain the official IPC6 data (an Intel Core(tm) Quad-processor Q6600 with 3 Gbytes of RAM). For the comparison of Pb P.s/q and the IPC7 planners, all systems were run using the same machine of IPC7 (a Quad-core Intel Xeon 2.93 GHz with 4 Gbytes of RAM) that the IPC-organizers made available to us for this experiment. Unless otherwise specified, the other experiments were conducted using a Quad-core Intel Xeon(tm) 3.16 GHz with 4 Gbytes of RAM. |
| Software Dependencies | Yes | The current implementation of Pb P incorporates eight well-known successful planners, Fast Downward (Helmert, 2006), LAMA (Richter & Westphal, 2010), LPG-td (Gerevini et al., 2006), Macro-FF (Botea et al., 2005, 2007b), Marvin (Coles & Smith, 2007), Metric-FF (Hoffmann, 2003), SGPlan5 (Chen et al., 2006), YAHSP (Vidal, 2004) and a recent version of LPG (Par LPG) using a dedicated configuration phase to automatically optimize the setting of a collection of parameters governing the behavior of several parts of the system (Vallati et al., 2013b). Basically, running Par LPG consists in running LPG using a domain-specific parameter configuration. Every other incorporated planner runs using its default parameter configuration. |
| Experiment Setup | Yes | Unless otherwise indicated, as in IPC6-7, the CPU-time limit of each run of Pb P.s/q was 15 minutes, Pb P.s/q used the default configuration process (the CPU-time limit for each simulated execution of a planner cluster was 15 minutes), and the planners of the configured portfolio were run by the round-robin scheduling described in Section 3.2. The performance data of each planner in Pb P.s/q incorporating a randomized algorithm (i.e., LPG, Par LPG and LAMA) were obtained by a single run for each considered problem instance. In the rest of the paper, the sequence of increasing percentages p1, ..., pn used to define the planning time slots is called the problem coverage percentage vector (PCPV). The default PCPV in Pb P is the sequence 25, 50, 75, 80, 85, 90, 95, 97, 99 (n = 9), which is the same used in the work of Roberts and Howe (2007). |