PlanVerb: Domain-Independent Verbalization and Summary of Task Plans
Authors: Gerard Canal, Senka Krivić, Paul Luff, Andrew Coles9698-9706
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our user survey evaluation shows that users can read our automatically generated plan descriptions and that the explanations help them answer questions about the plan. We have evaluated the proposed plan verbalization method and spaces. First, we provide some examples of automatically verbalized actions. Then, we analyse the impact of the verbalization space parameters. Finally, we comment on the results of an online survey regarding the verbalization. For our evaluation we used ROSPlan with the POPF planner (Coles et al. 2010) for PDDL domains, and the PROST planner (Keller and Eyerich 2012) for RDDL domains. |
| Researcher Affiliation | Academia | Gerard Canal,1 Senka Krivi c,1 Paul Luff,2 Andrew Coles1 1 Department of Informatics, King s College London 2 King s Business School, King s College London gerard.canal@kcl.ac.uk, senka.krivic@kcl.ac.uk, paul.luff@kcl.ac.uk, andrew.coles@kcl.ac.uk |
| Pseudocode | Yes | Algorithm 1 shows the pseudocode of the Plan Verb algorithm. |
| Open Source Code | Yes | The code, domains, and the complete set of verbalized plans with all the combinations of verbalization space parameters can be found in https://github.com/gerardcanal/task_plan_verbalization |
| Open Datasets | Yes | Those include the office robot domain (4 instances), the IPC 02 Rovers domain (Long and Fox 2003) (19 instances), the IPC 08 Crew Planning domain (Barreiro, Jones, and Schaffer 2009) (30 instances) for PDDL. For RDDL, we have used the IPPC 14 triangle tireworld (Little and Thiebaux 2007), the printfetching domain from (Canal et al. 2019), and 3 interactive robotics domains (Canal, Torras, and Alenyà 2022), involving assistive feeding and dressing tasks. |
| Dataset Splits | No | The paper describes using a set of 'test domains' and conducting an 'online user survey' with participants, but it does not specify training, validation, or test dataset splits for machine learning model development or evaluation, such as percentages or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details (like GPU/CPU models, processor types, or memory amounts) used for running its experiments or the planning processes. |
| Software Dependencies | No | The paper mentions software like ROSPlan, POPF, PROST, mlconjug3, and spaCy, but it does not provide specific version numbers for these components required for reproducibility. |
| Experiment Setup | Yes | We have validated the effect of the verbalization space parameters with a set of test domains... We have computed a plan for all the domains and instances and verbalized it with all the combinations of parameters. The narration for v1 was generated with parameters (a, l, s, e) = (A3, All plan, Detailed Narrative, E1). The other one, v2 is a summarised version of the same plan including explanations, generated with parameters (a, l, s, e) = (A3, All plan, Summary, E4). |