Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions

Authors: Weirui Ye, Pieter Abbeel, Yang Gao

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our method can achieve comparable performances to the original search algorithm while requiring less than 50% search time on average. We believe that this approach is a viable alternative for tasks under limited time and resources.
Researcher Affiliation Collaboration Weirui Ye Pieter Abbeel Yang Gao Tsinghua University, UC Berkeley, Shanghai Qi Zhi Institute
Pseudocode Yes Algorithm 1 Iteration of vanilla MCTS, Algorithm 2 Iteration of MCTS with Virtual Expansion, Algorithm 3 Virtual MCTS
Open Source Code Yes The code is available at https://github.com/Ye WR/V-MCTS.git.
Open Datasets Yes The environment of Go is built based on an open-source codebase, Gym Go [19]. We evaluate the performance of the agent against GNU Go v3.8 at level 10 [5] for 200 games. ... As for the Atari games, we choose 5 games with 100k environment steps.
Dataset Splits No The paper describes evaluation procedures against GNU Go and using evaluation seeds for Atari games, but it does not specify traditional dataset validation splits (e.g., percentages or counts for a separate validation set).
Hardware Specification Yes Recently, Ye et al. [34] proposed Efficient Zero, a variant of Mu Zero [27] with three extra components to improve the sample efficiency, which only requires 8 GPUs in training, and thus it is more affordable.
Software Dependencies No The paper does not provide specific version numbers for software dependencies (e.g., Python, deep learning frameworks like PyTorch or TensorFlow, or other libraries).
Experiment Setup Yes Hyper-parameters As for the Go 9 9, we choose Tromp-Taylor rules. The environment of Go is built based on an open-source codebase, Gym Go [19]. We evaluate the performance of the agent against GNU Go v3.8 at level 10 [5] for 200 games. ... We set the komi to 6.5... As for the Atari games, we choose 5 games with 100k environment steps. In each setting, we use 3 training seeds and 100 evaluation seeds for each trained model. ... The default values of r, are set to 0.2, 0.1.