reproducibilityindex.ai

PlaSma: Procedural Knowledge Models for Language-based Planning and Re-Planning

Authors: Faeze Brahman, Chandra Bhagavatula, Valentina Pyatkin, Jena D. Hwang, Xiang Lorraine Li, Hirona Jacqueline Arai, Soumya Sanyal, Keisuke Sakaguchi, Xiang Ren, Yejin Choi

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our approach is effective at endowing smaller LMs with planning abilities. For the standard planning task, smaller student models (of varying sizes) achieve 17.57% relative improvements, on average, over their teacher. The best student model is comparable even to GPT-3, a model 16 times the student s size.
Researcher Affiliation	Collaboration	1Allen Institute for Artificial Intelligence 2University of Washington 3University of Southern California 4Tohoku University 5University of Pittsburg
Pseudocode	Yes	Figure 2: Verifier-guided Step-wise Beam Search. For brevity, we only showcase with N = 5 and K = 2 for the first step and N = 4 and K = 2 for the second step. The scores are for illustration.
Open Source Code	Yes	Our data and code is publicly available at: https://github.com/allenai/PlaSma
Open Datasets	Yes	Our data and code is publicly available at: https://github.com/allenai/PlaSma...We use a subset of the existing Pro Script (Sakaguchi et al., 2021) and De Script (Wanzare et al., 2016) datasets as our seed source to form in-context examples...
Dataset Splits	No	We conduct a small grid search on validation loss for batch size bs = {16, 32, 64} and learning rate lr = {1e-4, 1e-5, 1e-6, 5e-6}. We train for 10 epochs with early stopping on validation accuracy using batch size of 32 and learning rate of 1e-5.
Hardware Specification	No	Main experiments can be done on 2 GPUs with 48GB of memory.
Software Dependencies	No	Student models are trained using Huggingface Transformers (Wolf et al., 2020).
Experiment Setup	Yes	During inference, we use a beam K = 5 for regular beam search, and N = 10 (next-step candidates), beam K = 5, p = 0.9, and α = 0.5 for our verifier-guided step-wise decoding (see 2.3)...We conduct a small grid search on validation loss for batch size bs = {16, 32, 64} and learning rate lr = {1e-4, 1e-5, 1e-6, 5e-6}.