reproducibilityindex.ai

AlphaZero-Like Tree-Search can Guide Large Language Model Decoding and Training

Authors: Ziyu Wan, Xidong Feng, Muning Wen, Stephen Marcus Mcaleer, Ying Wen, Weinan Zhang, Jun Wang

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results across reasoning, planning, alignment, and decision-making tasks show that TS-LLM outperforms existing approaches and can handle trees with a depth of 64.
Researcher Affiliation	Academia	1Shanghai Jiao Tong University 2University College London 3Carnegie Mellon University.
Pseudocode	Yes	Algorithm 1 Simplied MCTS Simulation; Algorithm 2 MCTS-Rollout; Algorithm 3 MCTS-α
Open Source Code	Yes	Our code is open-sourced at https://github.com/waterhorse1/LLM_Tree_Search.
Open Datasets	Yes	GSM8k (Cobbe et al., 2021), Game24 (Yao et al., 2023), Pr Onto QA (Saparov & He, 2022), RLHF alignment task using synthetic RLHF data (Dahoas), and chess endgames (Abdulhai et al., 2023).
Dataset Splits	Yes	We split the dataset to 30000/3000 as training and test set respectively.
Hardware Specification	Yes	The experiments were conducted on the same machine with 8 NVIDIA A800 GPUs, the CPU information is Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz.
Software Dependencies	No	The paper mentions models like LLaMA2-7B and GPT-2-small, but does not specify software dependencies with version numbers (e.g., PyTorch version, CUDA version).
Experiment Setup	Yes	The training is conducted on 8 NVIDIA A800 GPUs, using a cosine scheduler decaying from lr=2e-5 to 0.0 with a warmup ratio of 0.03, batch size 128 for 3 epochs. We set temperature=1.0, top p=1.0, top k=100 when using LLM to generate tree actions.