Multiple Policy Value Monte Carlo Tree Search

Authors: Li-Cheng Lan, Wei Li, Ting-Han Wei, I-Chen Wu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show through experiments on the game No Go that a combined f S and f L MPV-MCTS outperforms single PV-NN with policy value MCTS, called PV-MCTS.
Researcher Affiliation Academia 1Department of Computer Science, National Chiao Tung University, Taiwan 2Pervasive Artificial Intelligence Research (PAIR) Labs, Taiwan {sb710031, fm.bigballon, tinghan.wei}@gmail.com, icwu@csie.nctu.edu.tw
Pseudocode Yes Algorithm 1: MPV-MCTS Algorithm
Open Source Code No The paper mentions 'Ha Ha No Go' as an open-source program they used as a baseline, with a link to its GitHub repository. However, it does not provide a link or explicit statement about the open-sourcing of the 'MPV-MCTS' implementation itself.
Open Datasets No The paper states, 'We trained both f64,5 and f128,10 from a dataset of 200,000 games (about 107 positions) generated by Ha Ha No Go with 50,000 simulations for each move via self-play.' While Ha Ha No Go is an open-source program, there is no explicit link, DOI, or formal citation provided for the generated dataset itself.
Dataset Splits No The paper describes training processes and self-play game generation but does not provide specific details on train/validation/test dataset splits, such as percentages, sample counts, or citations to predefined splits.
Hardware Specification Yes In this paper, all experiments are performed on eight Intel Xeon(R) Gold 6154 CPUs and 64 Nvidia Tesla V100 GPUs.
Software Dependencies No The paper mentions various algorithms and models such as MCTS, DNNs, PV-NNs, and Alpha Go Zero, but it does not list specific software dependencies with their version numbers (e.g., programming languages, libraries, or frameworks like PyTorch, TensorFlow, or CUDA versions).
Experiment Setup Yes We trained both f64,5 and f128,10 using the following settings: simulation count: 800, PUCT constant: 1.5, learning rate: 0.05, batch size: 1024.