reproducibilityindex.ai

Generalized Planning with Positive and Negative Examples

Authors: Javier Segovia-Aguas, Sergio Jiménez, Anders Jonsson9949-9956

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments This section reports the empirical performance of our approach for the synthesis and evaluation of programs for generalized planning1. All experiments are run on an Intel Core i5 2.90GHz x 4 with a memory limit of 4GB and 600 seconds of planning timeout. In order to compare with previous approaches, we use Fast Downward (Helmert 2006) in the LAMA-2011 setting (Richter, Westphal, and Helmert 2011) to synthesize and evaluate programs using the presented compilations.
Researcher Affiliation	Academia	Javier Segovia-Aguas,1 Sergio Jim enez,2 Anders Jonsson3 1IRI Institut de Rob otica i Inform atica Industrial, CSIC-UPC 2VRAIN Valencian Research Institute for Artiﬁcial Intelligence, Universitat Polit ecnica de Val encia 3Universitat Pompeu Fabra
Pseudocode	No	The paper describes methods and actions formally but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	1The source code, benchmarks and scripts are in the Automated Programming Framework (Segovia-Aguas 2017) such that any experimental data in the paper can be reproduced.
Open Datasets	No	The paper describes generalized planning tasks like 'Green Block', 'Fibonacci', 'Gripper', 'List', 'Triangular Sum', and 'Robo Painter' and mentions using 'almost 100 random conﬁgurations with at most 5 instances that could be either positive or negative'. However, it does not provide concrete access information (link, DOI, or formal citation for the datasets themselves) to these tasks or instances, nor does it explicitly state they are publicly available datasets in the typical sense.
Dataset Splits	No	The paper states: 'For the synthesis of programs that solve the previous generalized planning tasks, we compare two versions of our compilation, PN-Lite and PN, with the results from some problems whose solutions where solved and reported as One Procedure in Segovia-Aguas, Jim enez, and Jonsson (2016). We use PN to denote the version with positive and negative examples that detect the three possible failures of a planning program, whereas PN-Lite is a simpler sound version that detects incomplete programs and inapplicable actions but not inﬁnite loops. In this experiment we have run almost 100 random conﬁgurations with at most 5 instances that could be either positive or negative (where at least one is forced to be positive, see the previous section).' and 'Negative examples are useful for deﬁning quantitative metrics that evaluate the coverage of generalized plans with respect to a test set of unseen examples.' While it mentions synthesis from instances and evaluation on a test set, it does not provide specific details on how the data is split (e.g., exact percentages, sample counts for training/validation/test, or cross-validation setup).
Hardware Specification	Yes	All experiments are run on an Intel Core i5 2.90GHz x 4 with a memory limit of 4GB and 600 seconds of planning timeout.
Software Dependencies	No	The paper mentions using 'Fast Downward (Helmert 2006) in the LAMA-2011 setting (Richter, Westphal, and Helmert 2011)', but does not provide specific version numbers for these software tools.
Experiment Setup	Yes	All experiments are run on an Intel Core i5 2.90GHz x 4 with a memory limit of 4GB and 600 seconds of planning timeout.