Generalized Planning in PDDL Domains with Pretrained Large Language Models
Authors: Tom Silver, Soham Dan, Kavitha Srinivas, Joshua B. Tenenbaum, Leslie Kaelbling, Michael Katz
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate this approach in seven PDDL domains and compare it to four ablations and four baselines. Overall, we find that GPT-4 is a surprisingly powerful generalized planner. |
| Researcher Affiliation | Collaboration | Tom Silver1, Soham Dan2, Kavitha Srinivas2, Joshua B. Tenenbaum1, Leslie Kaelbling1, Michael Katz2 1MIT Computer Science and Artificial Intelligence Laboratory 2IBM Research |
| Pseudocode | No | The paper provides a Python function signature 'def get_plan(objects, init, goal):' but does not include any pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | To facilitate reproducibility, we have released all chat logs and code. |
| Open Datasets | Yes | The first six domains (and tasks) are taken directly from the previous work by Yang et al. (2022). |
| Dataset Splits | No | The paper mentions 'training tasks' for synthesizing and debugging, and 'evaluation tasks' for measuring performance, but does not explicitly specify a distinct 'validation' dataset split for hyperparameter tuning or model selection. |
| Hardware Specification | Yes | We used a Macbook Pro laptop with an M1 chip and 64 GB RAM. |
| Software Dependencies | No | The paper mentions using GPT-4, GPT-3.5 via the Chat GPT browser interface, Python for synthesized programs, and the VAL tool, but does not provide specific version numbers for these software dependencies (e.g., Python version, VAL version). |
| Experiment Setup | Yes | To compensate for the limited context window size of transformer-based LLMs like GPT-4, we abbreviate the encoding of the training tasks in two ways. First, we always use only two training tasks, even when more are given. Second, within each training task, we limit the number of objects and initial state ground atoms shown. For each object type, if the number of objects of that type exceeds 10, we truncate the object set and add ellipses. Similarly, for each predicate, if the number of ground atoms with that predicate exceeds 10, we truncate and add ellipses. |