THOUGHT PROPAGATION: AN ANALOGICAL APPROACH TO COMPLEX REASONING WITH LARGE LANGUAGE MODELS
Authors: Junchi Yu, Ran He, Zhitao Ying
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments across three challenging tasks demonstrate TP enjoys a substantial improvement over the baselines by an average of 12% absolute increase in finding the optimal solutions in Shortest-path Reasoning, 13% improvement of human preference in Creative Writing, and 15% enhancement in the task completion rate of LLM-Agent Planning. |
| Researcher Affiliation | Academia | Junchi Yu & Ran He MAIS& CRIPAC, Institute of Automation Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing, China yujunchi2019@ia.ac.cn, rhe@nlpr.ia.ac.cn Rex Ying Department of Computer Sciences Yale University New Haven, USA rex.ying@yale.edu |
| Pseudocode | Yes | Init Path = [0] While not reach Node 8 and not exceed max steps: Current_node=Path[-1] Next_node_set=LLM_Neighbor_search(Current_node) Best_next_node=LLM_Evaluate(Next_node_set) Path.append(Best_next_node) print(Path) |
| Open Source Code | Yes | Code is available on https://github.com/Samyu0304/thought-propagation. |
| Open Datasets | Yes | We use ALFWorld (Shridhar et al., 2021) game suite to instantiate the LLM-Agent Planning task with 134 environments. |
| Dataset Splits | No | The paper mentions '0-shot, 1-shot, and 5-shot prompting settings' and '100 test instances' or '134 unseen environments for evaluation', but does not provide specific train/validation/test dataset splits or cross-validation details for reproducibility. |
| Hardware Specification | No | The paper mentions using LLM backends such as Pa LM 2, GPT-3.5, and GPT-4, but does not provide specific hardware details like GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Python' for graph generation and various LLM backends (GPT-3.5, GPT-4, PaLM 2), but does not provide specific version numbers for software libraries, frameworks, or dependencies used in the experiments. |
| Experiment Setup | No | The paper describes prompting settings (0-shot, 1-shot, 5-shot) and LLM models used, but does not provide specific hyperparameter values (e.g., learning rate, batch size) or detailed system-level training configurations. |