reproducibilityindex.ai

Multiple Policy Value Monte Carlo Tree Search

Authors: Li-Cheng Lan, Wei Li, Ting-Han Wei, I-Chen Wu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show through experiments on the game No Go that a combined f S and f L MPV-MCTS outperforms single PV-NN with policy value MCTS, called PV-MCTS.
Researcher Affiliation	Academia	1Department of Computer Science, National Chiao Tung University, Taiwan 2Pervasive Artiﬁcial Intelligence Research (PAIR) Labs, Taiwan {sb710031, fm.bigballon, tinghan.wei}@gmail.com, icwu@csie.nctu.edu.tw
Pseudocode	Yes	Algorithm 1: MPV-MCTS Algorithm
Open Source Code	No	The paper mentions 'Ha Ha No Go' as an open-source program they used as a baseline, with a link to its GitHub repository. However, it does not provide a link or explicit statement about the open-sourcing of the 'MPV-MCTS' implementation itself.
Open Datasets	No	The paper states, 'We trained both f64,5 and f128,10 from a dataset of 200,000 games (about 107 positions) generated by Ha Ha No Go with 50,000 simulations for each move via self-play.' While Ha Ha No Go is an open-source program, there is no explicit link, DOI, or formal citation provided for the generated dataset itself.
Dataset Splits	No	The paper describes training processes and self-play game generation but does not provide specific details on train/validation/test dataset splits, such as percentages, sample counts, or citations to predefined splits.
Hardware Specification	Yes	In this paper, all experiments are performed on eight Intel Xeon(R) Gold 6154 CPUs and 64 Nvidia Tesla V100 GPUs.
Software Dependencies	No	The paper mentions various algorithms and models such as MCTS, DNNs, PV-NNs, and Alpha Go Zero, but it does not list specific software dependencies with their version numbers (e.g., programming languages, libraries, or frameworks like PyTorch, TensorFlow, or CUDA versions).
Experiment Setup	Yes	We trained both f64,5 and f128,10 using the following settings: simulation count: 800, PUCT constant: 1.5, learning rate: 0.05, batch size: 1024.