PlaSma: Procedural Knowledge Models for Language-based Planning and Re-Planning

Authors: Faeze Brahman, Chandra Bhagavatula, Valentina Pyatkin, Jena D. Hwang, Xiang Lorraine Li, Hirona Jacqueline Arai, Soumya Sanyal, Keisuke Sakaguchi, Xiang Ren, Yejin Choi

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our approach is effective at endowing smaller LMs with planning abilities. For the standard planning task, smaller student models (of varying sizes) achieve 17.57% relative improvements, on average, over their teacher. The best student model is comparable even to GPT-3, a model 16 times the student s size.
Researcher Affiliation Collaboration 1Allen Institute for Artificial Intelligence 2University of Washington 3University of Southern California 4Tohoku University 5University of Pittsburg
Pseudocode Yes Figure 2: Verifier-guided Step-wise Beam Search. For brevity, we only showcase with N = 5 and K = 2 for the first step and N = 4 and K = 2 for the second step. The scores are for illustration.
Open Source Code Yes Our data and code is publicly available at: https://github.com/allenai/PlaSma
Open Datasets Yes Our data and code is publicly available at: https://github.com/allenai/PlaSma...We use a subset of the existing Pro Script (Sakaguchi et al., 2021) and De Script (Wanzare et al., 2016) datasets as our seed source to form in-context examples...
Dataset Splits No We conduct a small grid search on validation loss for batch size bs = {16, 32, 64} and learning rate lr = {1e-4, 1e-5, 1e-6, 5e-6}. We train for 10 epochs with early stopping on validation accuracy using batch size of 32 and learning rate of 1e-5.
Hardware Specification No Main experiments can be done on 2 GPUs with 48GB of memory.
Software Dependencies No Student models are trained using Huggingface Transformers (Wolf et al., 2020).
Experiment Setup Yes During inference, we use a beam K = 5 for regular beam search, and N = 10 (next-step candidates), beam K = 5, p = 0.9, and α = 0.5 for our verifier-guided step-wise decoding (see 2.3)...We conduct a small grid search on validation loss for batch size bs = {16, 32, 64} and learning rate lr = {1e-4, 1e-5, 1e-6, 5e-6}.