reproducibilityindex.ai

SayCanPay: Heuristic Planning with Large Language Models Using Learnable Domain Knowledge

Authors: Rishi Hazra, Pedro Zuidberg Dos Martires, Luc De Raedt

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive evaluations show that our model surpasses other LLM planning approaches. ... Experimental Setup ... Data Splits and Evaluation.
Researcher Affiliation	Academia	Centre for Applied Autonomous Sensor Systems (AASS), Orebro University, Sweden 2KU Leuven, Belgium
Pseudocode	No	The paper describes the methodology in prose and figures but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code link: https://rishihazra.github.io/Say Can Pay/
Open Datasets	Yes	Ravens (Zeng et al. 2021) is a Py Bullet simulated task set... Baby AI (Chevalier-Boisvert et al. 2019) is a 2D-grid world environment... Virtual Home (Puig et al. 2018) is an interactive platform... We utilize the Virtual Home Env dataset (Liao et al. 2019)
Dataset Splits	Yes	We created three data splits for each environment using expert trajectories. ... We use 800 expert train trajectories for each Ravens task and 400 for Baby AI. ... An additional 100 expert trajectories were collected for each test split (20 for Virtual Home test-generalize). ... a train-validation split of 80-20.
Hardware Specification	Yes	The Can and Pay models were trained on 7 NVIDIA-DGX V-100 GPUs using the Distributed Data Parallel framework across 20 epochs.
Software Dependencies	No	The paper mentions models like Vicuna and Flan-T5, and optimizer AdamW, but does not provide specific version numbers for general software dependencies such as Python, PyTorch, or HuggingFace Transformers.
Experiment Setup	Yes	The Can and Pay models were trained on 7 NVIDIA-DGX V-100 GPUs using the Distributed Data Parallel framework across 20 epochs. Training parameters included a 1e-4 learning rate, Adam W optimizer with 1e5 weight decay, a batch size of 50, a train-validation split of 80-20. For inference, hyperparameters are listed in Table 6, including max new tokens, beam groups, diversity penalty, candidates (m), and beam-size (k).