Neuro-Symbolic Procedural Planning with Commonsense Prompting
Authors: Yujie Lu, Weixi Feng, Wanrong Zhu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Both automatic and human evaluations on Wiki How and Robot How show the superiority of PLAN on procedural planning without further training or manual exemplars. |
| Researcher Affiliation | Academia | 1University of California, Santa Barbara, CA, USA {yujielu,weixifeng,wanrongzhu,wendaxu}@ucsb.edu {migueleckstein,wangwilliamyang}@ucsb.edu 2University of California, Santa Cruz, CA, USA xwang366@ucsc.edu |
| Pseudocode | Yes | Algorithm 1 Neuro-Symbolic Procedural Planning using Commonsense-Infused Prompting |
| Open Source Code | Yes | Source code and datasets are publicly available at https://sites.google.com/view/iclr-clap ... We provide our code implementation at https://anonymous.4open.science/r/PLANNER-7B24 to reproduce our experiments. |
| Open Datasets | Yes | Datasets We conduct zero-shot experiments on two datasets with procedural information, Wiki How (collected following (Koupaee & Wang, 2018)) and Robot How (Puig et al., 2018) without training. ... Source code and datasets are publicly available at https://sites.google.com/view/iclr-clap |
| Dataset Splits | No | We conduct zero-shot experiments on two datasets with procedural information...without training. ... We perform a hyperparameter search for all evaluated methods for the following hyperparameters. ... The configurations used in the experiments are: θ=0.7, 20 step horizon, 3 hops, 3 ratio of concepts to task length, cosine similarity threshold 0.4, θe=0.6 and k=10. The paper performs hyperparameter search but does not specify a separate validation dataset split used for this purpose, only the chosen configurations. |
| Hardware Specification | Yes | We use one single NVIDIA A100 GPU Server for all the experiments. |
| Software Dependencies | No | The paper mentions software like 'BART-large version', '1.5 billion parameter GPT-2 (aka gpt2-xl)', 'GPT3 (davinci)', 'sentence-transformers (Ro BERTa-large)', and 'Hugging Face'. However, it does not provide specific version numbers for the general software environment (e.g., Python, PyTorch, CUDA) or the mentioned libraries/models. |
| Experiment Setup | Yes | The configurations used in the experiments are: θ=0.7, 20 step horizon, 3 hops, 3 ratio of concepts to task length, cosine similarity threshold 0.4, θe=0.6 and k=10. We perform a hyperparameter search for all evaluated methods for the following hyperparameters. The confidence threshold θ, which terminate the generation when below it, is searched in {0, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8}. The steps horizon, which constrains the maximal number of procedural planning steps, is searched in {10, 20, 40}. The number of hops for retrieving the subgraph from the external knowledge base is searched in {1, 2, 3}. The ratio of maximal concepts to the length of the task name is searched in {1, 2, 3}. The cosine similarity threshold for keeping the task-specific concept is searched in {0.4, 0.6, 0.8}. The edge weight threshold θe is searched in {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8}. The top-k task-specific nodes value is searched in {1, 5, 10, 15, 20, 25, 50, 100}. |