reproducibilityindex.ai

Tree-Planner: Efficient Close-loop Task Planning with Large Language Models

Authors: Mengkang Hu, Yao Mu, Xinmiao Chelsey Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, Ping Luo

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that TREE-PLANNER achieves state-of-the-art performance while maintaining high efficiency.
Researcher Affiliation	Collaboration	Corresponding authors: Mingyu Ding and Ping Luo ({dingmyu, pluo.lhi}@gmail.com). The University of Hong Kong. Harbin Institute of Technology. Noah s Ark Laboratory. Shanghai AI Laboratory.
Pseudocode	Yes	Algorithm 1: Action Tree Construction Input : c, r Output: r
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for their methodology is open-source or publicly available.
Open Datasets	Yes	Environment. We conduct the experiments in the Virtual Home (VH) Environment (Puig et al., 2018), a simulation platform for household tasks. Dataset. We constructed a dataset consisting of 4 VH scenes and 35 unique VH tasks. Each task includes a task name, goal conditions, and a gold plan. We started by annotating goal conditions for each task from Activity Programs knowledge base by Puig et al. (2018) via executing the programs.
Dataset Splits	Yes	We take 4 representative tasks from the dataset as in-context learning exemplars and the rest as the validation set.
Hardware Specification	No	The paper mentions using the 'Open AI GPT-3.5 (text-davinci-003) API' but does not specify any particular hardware used for running their experiments.
Software Dependencies	No	The paper mentions using 'Open AI GPT-3.5 (text-davinci-003) API' and 'BERT similarity' (linking to sbert.net) but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	To sample diverse plans, we applied a temperature of 0.8 and a top-p value of 0.95. During grounded deciding, we set the temperature to 0.7, top-p to 1.0, and sampling parameter n to 20. Additionally, we utilize a majority vote to obtain the final option in order to alleviate format errors in the output of LLMs. The maximum number of error corrections is set to 10 for all evaluated approaches. ... In the case of Grounded Deciding, the optimal hyperparameter combination was found to be a temperature of 0.7 and topp of 1.0. As for ITERATIVE-PLANNER, the optimal hyperparameter combination was a temperature of 0 and topp of 1.0.