THOUGHT PROPAGATION: AN ANALOGICAL APPROACH TO COMPLEX REASONING WITH LARGE LANGUAGE MODELS

Authors: Junchi Yu, Ran He, Zhitao Ying

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments across three challenging tasks demonstrate TP enjoys a substantial improvement over the baselines by an average of 12% absolute increase in finding the optimal solutions in Shortest-path Reasoning, 13% improvement of human preference in Creative Writing, and 15% enhancement in the task completion rate of LLM-Agent Planning.
Researcher Affiliation Academia Junchi Yu & Ran He MAIS& CRIPAC, Institute of Automation Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing, China yujunchi2019@ia.ac.cn, rhe@nlpr.ia.ac.cn Rex Ying Department of Computer Sciences Yale University New Haven, USA rex.ying@yale.edu
Pseudocode Yes Init Path = [0] While not reach Node 8 and not exceed max steps: Current_node=Path[-1] Next_node_set=LLM_Neighbor_search(Current_node) Best_next_node=LLM_Evaluate(Next_node_set) Path.append(Best_next_node) print(Path)
Open Source Code Yes Code is available on https://github.com/Samyu0304/thought-propagation.
Open Datasets Yes We use ALFWorld (Shridhar et al., 2021) game suite to instantiate the LLM-Agent Planning task with 134 environments.
Dataset Splits No The paper mentions '0-shot, 1-shot, and 5-shot prompting settings' and '100 test instances' or '134 unseen environments for evaluation', but does not provide specific train/validation/test dataset splits or cross-validation details for reproducibility.
Hardware Specification No The paper mentions using LLM backends such as Pa LM 2, GPT-3.5, and GPT-4, but does not provide specific hardware details like GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper mentions 'Python' for graph generation and various LLM backends (GPT-3.5, GPT-4, PaLM 2), but does not provide specific version numbers for software libraries, frameworks, or dependencies used in the experiments.
Experiment Setup No The paper describes prompting settings (0-shot, 1-shot, 5-shot) and LLM models used, but does not provide specific hyperparameter values (e.g., learning rate, batch size) or detailed system-level training configurations.