SayCanPay: Heuristic Planning with Large Language Models Using Learnable Domain Knowledge
Authors: Rishi Hazra, Pedro Zuidberg Dos Martires, Luc De Raedt
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive evaluations show that our model surpasses other LLM planning approaches. ... Experimental Setup ... Data Splits and Evaluation. |
| Researcher Affiliation | Academia | Centre for Applied Autonomous Sensor Systems (AASS), Orebro University, Sweden 2KU Leuven, Belgium |
| Pseudocode | No | The paper describes the methodology in prose and figures but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code link: https://rishihazra.github.io/Say Can Pay/ |
| Open Datasets | Yes | Ravens (Zeng et al. 2021) is a Py Bullet simulated task set... Baby AI (Chevalier-Boisvert et al. 2019) is a 2D-grid world environment... Virtual Home (Puig et al. 2018) is an interactive platform... We utilize the Virtual Home Env dataset (Liao et al. 2019) |
| Dataset Splits | Yes | We created three data splits for each environment using expert trajectories. ... We use 800 expert train trajectories for each Ravens task and 400 for Baby AI. ... An additional 100 expert trajectories were collected for each test split (20 for Virtual Home test-generalize). ... a train-validation split of 80-20. |
| Hardware Specification | Yes | The Can and Pay models were trained on 7 NVIDIA-DGX V-100 GPUs using the Distributed Data Parallel framework across 20 epochs. |
| Software Dependencies | No | The paper mentions models like Vicuna and Flan-T5, and optimizer AdamW, but does not provide specific version numbers for general software dependencies such as Python, PyTorch, or HuggingFace Transformers. |
| Experiment Setup | Yes | The Can and Pay models were trained on 7 NVIDIA-DGX V-100 GPUs using the Distributed Data Parallel framework across 20 epochs. Training parameters included a 1e-4 learning rate, Adam W optimizer with 1e5 weight decay, a batch size of 50, a train-validation split of 80-20. For inference, hyperparameters are listed in Table 6, including max new tokens, beam groups, diversity penalty, candidates (m), and beam-size (k). |