reproducibilityindex.ai

Generalized Planning in PDDL Domains with Pretrained Large Language Models

Authors: Tom Silver, Soham Dan, Kavitha Srinivas, Joshua B. Tenenbaum, Leslie Kaelbling, Michael Katz

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate this approach in seven PDDL domains and compare it to four ablations and four baselines. Overall, we ﬁnd that GPT-4 is a surprisingly powerful generalized planner.
Researcher Affiliation	Collaboration	Tom Silver1, Soham Dan2, Kavitha Srinivas2, Joshua B. Tenenbaum1, Leslie Kaelbling1, Michael Katz2 1MIT Computer Science and Artiﬁcial Intelligence Laboratory 2IBM Research
Pseudocode	No	The paper provides a Python function signature 'def get_plan(objects, init, goal):' but does not include any pseudocode or a clearly labeled algorithm block.
Open Source Code	Yes	To facilitate reproducibility, we have released all chat logs and code.
Open Datasets	Yes	The ﬁrst six domains (and tasks) are taken directly from the previous work by Yang et al. (2022).
Dataset Splits	No	The paper mentions 'training tasks' for synthesizing and debugging, and 'evaluation tasks' for measuring performance, but does not explicitly specify a distinct 'validation' dataset split for hyperparameter tuning or model selection.
Hardware Specification	Yes	We used a Macbook Pro laptop with an M1 chip and 64 GB RAM.
Software Dependencies	No	The paper mentions using GPT-4, GPT-3.5 via the Chat GPT browser interface, Python for synthesized programs, and the VAL tool, but does not provide specific version numbers for these software dependencies (e.g., Python version, VAL version).
Experiment Setup	Yes	To compensate for the limited context window size of transformer-based LLMs like GPT-4, we abbreviate the encoding of the training tasks in two ways. First, we always use only two training tasks, even when more are given. Second, within each training task, we limit the number of objects and initial state ground atoms shown. For each object type, if the number of objects of that type exceeds 10, we truncate the object set and add ellipses. Similarly, for each predicate, if the number of ground atoms with that predicate exceeds 10, we truncate and add ellipses.