reproducibilityindex.ai

THOUGHT PROPAGATION: AN ANALOGICAL APPROACH TO COMPLEX REASONING WITH LARGE LANGUAGE MODELS

Authors: Junchi Yu, Ran He, Zhitao Ying

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments across three challenging tasks demonstrate TP enjoys a substantial improvement over the baselines by an average of 12% absolute increase in finding the optimal solutions in Shortest-path Reasoning, 13% improvement of human preference in Creative Writing, and 15% enhancement in the task completion rate of LLM-Agent Planning.
Researcher Affiliation	Academia	Junchi Yu & Ran He MAIS& CRIPAC, Institute of Automation Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing, China yujunchi2019@ia.ac.cn, rhe@nlpr.ia.ac.cn Rex Ying Department of Computer Sciences Yale University New Haven, USA rex.ying@yale.edu
Pseudocode	Yes	Init Path = [0] While not reach Node 8 and not exceed max steps: Current_node=Path[-1] Next_node_set=LLM_Neighbor_search(Current_node) Best_next_node=LLM_Evaluate(Next_node_set) Path.append(Best_next_node) print(Path)
Open Source Code	Yes	Code is available on https://github.com/Samyu0304/thought-propagation.
Open Datasets	Yes	We use ALFWorld (Shridhar et al., 2021) game suite to instantiate the LLM-Agent Planning task with 134 environments.
Dataset Splits	No	The paper mentions '0-shot, 1-shot, and 5-shot prompting settings' and '100 test instances' or '134 unseen environments for evaluation', but does not provide specific train/validation/test dataset splits or cross-validation details for reproducibility.
Hardware Specification	No	The paper mentions using LLM backends such as Pa LM 2, GPT-3.5, and GPT-4, but does not provide specific hardware details like GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper mentions 'Python' for graph generation and various LLM backends (GPT-3.5, GPT-4, PaLM 2), but does not provide specific version numbers for software libraries, frameworks, or dependencies used in the experiments.
Experiment Setup	No	The paper describes prompting settings (0-shot, 1-shot, 5-shot) and LLM models used, but does not provide specific hyperparameter values (e.g., learning rate, batch size) or detailed system-level training configurations.