Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions
Authors: Weirui Ye, Pieter Abbeel, Yang Gao
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our method can achieve comparable performances to the original search algorithm while requiring less than 50% search time on average. We believe that this approach is a viable alternative for tasks under limited time and resources. |
| Researcher Affiliation | Collaboration | Weirui Ye Pieter Abbeel Yang Gao Tsinghua University, UC Berkeley, Shanghai Qi Zhi Institute |
| Pseudocode | Yes | Algorithm 1 Iteration of vanilla MCTS, Algorithm 2 Iteration of MCTS with Virtual Expansion, Algorithm 3 Virtual MCTS |
| Open Source Code | Yes | The code is available at https://github.com/Ye WR/V-MCTS.git. |
| Open Datasets | Yes | The environment of Go is built based on an open-source codebase, Gym Go [19]. We evaluate the performance of the agent against GNU Go v3.8 at level 10 [5] for 200 games. ... As for the Atari games, we choose 5 games with 100k environment steps. |
| Dataset Splits | No | The paper describes evaluation procedures against GNU Go and using evaluation seeds for Atari games, but it does not specify traditional dataset validation splits (e.g., percentages or counts for a separate validation set). |
| Hardware Specification | Yes | Recently, Ye et al. [34] proposed Efficient Zero, a variant of Mu Zero [27] with three extra components to improve the sample efficiency, which only requires 8 GPUs in training, and thus it is more affordable. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies (e.g., Python, deep learning frameworks like PyTorch or TensorFlow, or other libraries). |
| Experiment Setup | Yes | Hyper-parameters As for the Go 9 9, we choose Tromp-Taylor rules. The environment of Go is built based on an open-source codebase, Gym Go [19]. We evaluate the performance of the agent against GNU Go v3.8 at level 10 [5] for 200 games. ... We set the komi to 6.5... As for the Atari games, we choose 5 games with 100k environment steps. In each setting, we use 3 training seeds and 100 evaluation seeds for each trained model. ... The default values of r, are set to 0.2, 0.1. |