PlaSma: Procedural Knowledge Models for Language-based Planning and Re-Planning
Authors: Faeze Brahman, Chandra Bhagavatula, Valentina Pyatkin, Jena D. Hwang, Xiang Lorraine Li, Hirona Jacqueline Arai, Soumya Sanyal, Keisuke Sakaguchi, Xiang Ren, Yejin Choi
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our approach is effective at endowing smaller LMs with planning abilities. For the standard planning task, smaller student models (of varying sizes) achieve 17.57% relative improvements, on average, over their teacher. The best student model is comparable even to GPT-3, a model 16 times the student s size. |
| Researcher Affiliation | Collaboration | 1Allen Institute for Artificial Intelligence 2University of Washington 3University of Southern California 4Tohoku University 5University of Pittsburg |
| Pseudocode | Yes | Figure 2: Verifier-guided Step-wise Beam Search. For brevity, we only showcase with N = 5 and K = 2 for the first step and N = 4 and K = 2 for the second step. The scores are for illustration. |
| Open Source Code | Yes | Our data and code is publicly available at: https://github.com/allenai/PlaSma |
| Open Datasets | Yes | Our data and code is publicly available at: https://github.com/allenai/PlaSma...We use a subset of the existing Pro Script (Sakaguchi et al., 2021) and De Script (Wanzare et al., 2016) datasets as our seed source to form in-context examples... |
| Dataset Splits | No | We conduct a small grid search on validation loss for batch size bs = {16, 32, 64} and learning rate lr = {1e-4, 1e-5, 1e-6, 5e-6}. We train for 10 epochs with early stopping on validation accuracy using batch size of 32 and learning rate of 1e-5. |
| Hardware Specification | No | Main experiments can be done on 2 GPUs with 48GB of memory. |
| Software Dependencies | No | Student models are trained using Huggingface Transformers (Wolf et al., 2020). |
| Experiment Setup | Yes | During inference, we use a beam K = 5 for regular beam search, and N = 10 (next-step candidates), beam K = 5, p = 0.9, and α = 0.5 for our verifier-guided step-wise decoding (see 2.3)...We conduct a small grid search on validation loss for batch size bs = {16, 32, 64} and learning rate lr = {1e-4, 1e-5, 1e-6, 5e-6}. |