AlphaZero-Like Tree-Search can Guide Large Language Model Decoding and Training

Authors: Ziyu Wan, Xidong Feng, Muning Wen, Stephen Marcus Mcaleer, Ying Wen, Weinan Zhang, Jun Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results across reasoning, planning, alignment, and decision-making tasks show that TS-LLM outperforms existing approaches and can handle trees with a depth of 64.
Researcher Affiliation Academia 1Shanghai Jiao Tong University 2University College London 3Carnegie Mellon University.
Pseudocode Yes Algorithm 1 Simplied MCTS Simulation; Algorithm 2 MCTS-Rollout; Algorithm 3 MCTS-α
Open Source Code Yes Our code is open-sourced at https://github.com/waterhorse1/LLM_Tree_Search.
Open Datasets Yes GSM8k (Cobbe et al., 2021), Game24 (Yao et al., 2023), Pr Onto QA (Saparov & He, 2022), RLHF alignment task using synthetic RLHF data (Dahoas), and chess endgames (Abdulhai et al., 2023).
Dataset Splits Yes We split the dataset to 30000/3000 as training and test set respectively.
Hardware Specification Yes The experiments were conducted on the same machine with 8 NVIDIA A800 GPUs, the CPU information is Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz.
Software Dependencies No The paper mentions models like LLaMA2-7B and GPT-2-small, but does not specify software dependencies with version numbers (e.g., PyTorch version, CUDA version).
Experiment Setup Yes The training is conducted on 8 NVIDIA A800 GPUs, using a cosine scheduler decaying from lr=2e-5 to 0.0 with a warmup ratio of 0.03, batch size 128 for 3 epochs. We set temperature=1.0, top p=1.0, top k=100 when using LLM to generate tree actions.