SayCanPay: Heuristic Planning with Large Language Models Using Learnable Domain Knowledge

Authors: Rishi Hazra, Pedro Zuidberg Dos Martires, Luc De Raedt

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive evaluations show that our model surpasses other LLM planning approaches. ... Experimental Setup ... Data Splits and Evaluation.
Researcher Affiliation Academia Centre for Applied Autonomous Sensor Systems (AASS), Orebro University, Sweden 2KU Leuven, Belgium
Pseudocode No The paper describes the methodology in prose and figures but does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code link: https://rishihazra.github.io/Say Can Pay/
Open Datasets Yes Ravens (Zeng et al. 2021) is a Py Bullet simulated task set... Baby AI (Chevalier-Boisvert et al. 2019) is a 2D-grid world environment... Virtual Home (Puig et al. 2018) is an interactive platform... We utilize the Virtual Home Env dataset (Liao et al. 2019)
Dataset Splits Yes We created three data splits for each environment using expert trajectories. ... We use 800 expert train trajectories for each Ravens task and 400 for Baby AI. ... An additional 100 expert trajectories were collected for each test split (20 for Virtual Home test-generalize). ... a train-validation split of 80-20.
Hardware Specification Yes The Can and Pay models were trained on 7 NVIDIA-DGX V-100 GPUs using the Distributed Data Parallel framework across 20 epochs.
Software Dependencies No The paper mentions models like Vicuna and Flan-T5, and optimizer AdamW, but does not provide specific version numbers for general software dependencies such as Python, PyTorch, or HuggingFace Transformers.
Experiment Setup Yes The Can and Pay models were trained on 7 NVIDIA-DGX V-100 GPUs using the Distributed Data Parallel framework across 20 epochs. Training parameters included a 1e-4 learning rate, Adam W optimizer with 1e5 weight decay, a batch size of 50, a train-validation split of 80-20. For inference, hyperparameters are listed in Table 6, including max new tokens, beam groups, diversity penalty, candidates (m), and beam-size (k).