Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search
Authors: Anji Liu, Jianshu Chen, Mingze Yu, Yu Zhai, Xuewen Zhou, Ji Liu
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on a proprietary benchmark and the Atari Game benchmark demonstrate the linear speedup and the superior performance of WU-UCT comparing to existing techniques. |
| Researcher Affiliation | Industry | Seattle AI Lab, Kwai Inc., Bellevue, WA 98004, USA {liuanji03,yumingze,zhaiyu,zhouxuewen,jiliu}@kuaishou.com Tencent AI Lab, Bellevue, WA 98004, USA jianshuchen@tencent.com |
| Pseudocode | Yes | The pseudo-code of WU-UCT is provided in Algorithm 1. Specifically, it provides the workflow of the master process. |
| Open Source Code | Yes | Code is available at https://github.com/liuanji/WU-UCT. |
| Open Datasets | Yes | We further evaluate WU-UCT on Atari Games (Bellemare et al., 2013), a classical benchmark for reinforcement learning (RL) and planning algorithms (Guo et al., 2014). |
| Dataset Splits | No | Specifically, training and validation are done on 300 levels that have been released in a test version of the game. (No specific split percentages or counts provided for the validation set, nor does it refer to a standard split by citation for the "Joy City" game.) |
| Hardware Specification | Yes | Experiments are deployed on 4 Intel Xeon E5-2650 v4 CPUs and 8 NVIDIA Ge Force RTX 2080 Ti GPUs. |
| Software Dependencies | No | The paper mentions using specific algorithms and models like PPO and A3C, and notes |
| Experiment Setup | Yes | For all tree search based algorithms (i.e., WU-UCT, Tree P, Leaf P, and Root P), the maximum depth of the search tree is set to 100. The search width is limited by 20 and the maximum number of simulations is 128. The discount factor γ is set to 0.99... |