Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search

Authors: Anji Liu, Jianshu Chen, Mingze Yu, Yu Zhai, Xuewen Zhou, Ji Liu

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on a proprietary benchmark and the Atari Game benchmark demonstrate the linear speedup and the superior performance of WU-UCT comparing to existing techniques.
Researcher Affiliation Industry Seattle AI Lab, Kwai Inc., Bellevue, WA 98004, USA {liuanji03,yumingze,zhaiyu,zhouxuewen,jiliu}@kuaishou.com Tencent AI Lab, Bellevue, WA 98004, USA jianshuchen@tencent.com
Pseudocode Yes The pseudo-code of WU-UCT is provided in Algorithm 1. Specifically, it provides the workflow of the master process.
Open Source Code Yes Code is available at https://github.com/liuanji/WU-UCT.
Open Datasets Yes We further evaluate WU-UCT on Atari Games (Bellemare et al., 2013), a classical benchmark for reinforcement learning (RL) and planning algorithms (Guo et al., 2014).
Dataset Splits No Specifically, training and validation are done on 300 levels that have been released in a test version of the game. (No specific split percentages or counts provided for the validation set, nor does it refer to a standard split by citation for the "Joy City" game.)
Hardware Specification Yes Experiments are deployed on 4 Intel Xeon E5-2650 v4 CPUs and 8 NVIDIA Ge Force RTX 2080 Ti GPUs.
Software Dependencies No The paper mentions using specific algorithms and models like PPO and A3C, and notes
Experiment Setup Yes For all tree search based algorithms (i.e., WU-UCT, Tree P, Leaf P, and Root P), the maximum depth of the search tree is set to 100. The search width is limited by 20 and the maximum number of simulations is 128. The discount factor γ is set to 0.99...