Generalized Planning in PDDL Domains with Pretrained Large Language Models

Authors: Tom Silver, Soham Dan, Kavitha Srinivas, Joshua B. Tenenbaum, Leslie Kaelbling, Michael Katz

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate this approach in seven PDDL domains and compare it to four ablations and four baselines. Overall, we find that GPT-4 is a surprisingly powerful generalized planner.
Researcher Affiliation Collaboration Tom Silver1, Soham Dan2, Kavitha Srinivas2, Joshua B. Tenenbaum1, Leslie Kaelbling1, Michael Katz2 1MIT Computer Science and Artificial Intelligence Laboratory 2IBM Research
Pseudocode No The paper provides a Python function signature 'def get_plan(objects, init, goal):' but does not include any pseudocode or a clearly labeled algorithm block.
Open Source Code Yes To facilitate reproducibility, we have released all chat logs and code.
Open Datasets Yes The first six domains (and tasks) are taken directly from the previous work by Yang et al. (2022).
Dataset Splits No The paper mentions 'training tasks' for synthesizing and debugging, and 'evaluation tasks' for measuring performance, but does not explicitly specify a distinct 'validation' dataset split for hyperparameter tuning or model selection.
Hardware Specification Yes We used a Macbook Pro laptop with an M1 chip and 64 GB RAM.
Software Dependencies No The paper mentions using GPT-4, GPT-3.5 via the Chat GPT browser interface, Python for synthesized programs, and the VAL tool, but does not provide specific version numbers for these software dependencies (e.g., Python version, VAL version).
Experiment Setup Yes To compensate for the limited context window size of transformer-based LLMs like GPT-4, we abbreviate the encoding of the training tasks in two ways. First, we always use only two training tasks, even when more are given. Second, within each training task, we limit the number of objects and initial state ground atoms shown. For each object type, if the number of objects of that type exceeds 10, we truncate the object set and add ellipses. Similarly, for each predicate, if the number of ground atoms with that predicate exceeds 10, we truncate and add ellipses.