Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
Authors: Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation in the recent Virtual Home environment shows that the resulting method substantially improves executability over the LLM baseline. The conducted human evaluation reveals a trade-off between executability and correctness but shows a promising sign towards extracting actionable knowledge from language models. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley 2Carnegie Mellon University 3Google. Correspondence to: Wenlong Huang <wenlong.huang@berkeley.edu>. |
| Pseudocode | Yes | Pseudocode is in Appendix A.4. Algorithm 1 Generating Action Plans from Pre-Trained Language Models with Proposed Procedure |
| Open Source Code | Yes | Website: https: //huangwl18.github.io/language-planner/. |
| Open Datasets | Yes | For our investigation, we use the recently proposed Virtual Home environment (Puig et al., 2018). It can simulate a large variety of realistic human activities in a household environment and supports the ability to perform them via a rich set of 47522 unique embodied actions defined with a verb-object syntax. [...] We use the Activity Programs knowledge base collected by Puig et al. (2018) for evaluation. |
| Dataset Splits | No | The paper mentions a "demonstration set" used for prompting and "held-out tasks for evaluation", but does not explicitly define a separate 'validation' split with sizes or percentages for hyperparameter tuning or model selection. |
| Hardware Specification | No | The paper does not explicitly provide details about the specific hardware used for running its experiments, such as GPU models, CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions using 'Open AI API', 'Hugging Face Transformers (Wolf et al., 2019)', and 'Sentence Transformers (Reimers & Gurevych, 2019)' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For all evaluated methods, we perform hyperparameter search over various sampling parameters, and for methods using a fixed prompt example, we report metrics averaged across three randomly chosen examples. [...] Appendix A.2. Hyperparameter Search |