reproducibilityindex.ai

Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search

Authors: Anji Liu, Jianshu Chen, Mingze Yu, Yu Zhai, Xuewen Zhou, Ji Liu

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on a proprietary benchmark and the Atari Game benchmark demonstrate the linear speedup and the superior performance of WU-UCT comparing to existing techniques.
Researcher Affiliation	Industry	Seattle AI Lab, Kwai Inc., Bellevue, WA 98004, USA {liuanji03,yumingze,zhaiyu,zhouxuewen,jiliu}@kuaishou.com Tencent AI Lab, Bellevue, WA 98004, USA jianshuchen@tencent.com
Pseudocode	Yes	The pseudo-code of WU-UCT is provided in Algorithm 1. Speciﬁcally, it provides the workﬂow of the master process.
Open Source Code	Yes	Code is available at https://github.com/liuanji/WU-UCT.
Open Datasets	Yes	We further evaluate WU-UCT on Atari Games (Bellemare et al., 2013), a classical benchmark for reinforcement learning (RL) and planning algorithms (Guo et al., 2014).
Dataset Splits	No	Speciﬁcally, training and validation are done on 300 levels that have been released in a test version of the game. (No specific split percentages or counts provided for the validation set, nor does it refer to a standard split by citation for the "Joy City" game.)
Hardware Specification	Yes	Experiments are deployed on 4 Intel Xeon E5-2650 v4 CPUs and 8 NVIDIA Ge Force RTX 2080 Ti GPUs.
Software Dependencies	No	The paper mentions using specific algorithms and models like PPO and A3C, and notes
Experiment Setup	Yes	For all tree search based algorithms (i.e., WU-UCT, Tree P, Leaf P, and Root P), the maximum depth of the search tree is set to 100. The search width is limited by 20 and the maximum number of simulations is 128. The discount factor γ is set to 0.99...