Multiple Policy Value Monte Carlo Tree Search
Authors: Li-Cheng Lan, Wei Li, Ting-Han Wei, I-Chen Wu
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show through experiments on the game No Go that a combined f S and f L MPV-MCTS outperforms single PV-NN with policy value MCTS, called PV-MCTS. |
| Researcher Affiliation | Academia | 1Department of Computer Science, National Chiao Tung University, Taiwan 2Pervasive Artiļ¬cial Intelligence Research (PAIR) Labs, Taiwan {sb710031, fm.bigballon, tinghan.wei}@gmail.com, icwu@csie.nctu.edu.tw |
| Pseudocode | Yes | Algorithm 1: MPV-MCTS Algorithm |
| Open Source Code | No | The paper mentions 'Ha Ha No Go' as an open-source program they used as a baseline, with a link to its GitHub repository. However, it does not provide a link or explicit statement about the open-sourcing of the 'MPV-MCTS' implementation itself. |
| Open Datasets | No | The paper states, 'We trained both f64,5 and f128,10 from a dataset of 200,000 games (about 107 positions) generated by Ha Ha No Go with 50,000 simulations for each move via self-play.' While Ha Ha No Go is an open-source program, there is no explicit link, DOI, or formal citation provided for the generated dataset itself. |
| Dataset Splits | No | The paper describes training processes and self-play game generation but does not provide specific details on train/validation/test dataset splits, such as percentages, sample counts, or citations to predefined splits. |
| Hardware Specification | Yes | In this paper, all experiments are performed on eight Intel Xeon(R) Gold 6154 CPUs and 64 Nvidia Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions various algorithms and models such as MCTS, DNNs, PV-NNs, and Alpha Go Zero, but it does not list specific software dependencies with their version numbers (e.g., programming languages, libraries, or frameworks like PyTorch, TensorFlow, or CUDA versions). |
| Experiment Setup | Yes | We trained both f64,5 and f128,10 using the following settings: simulation count: 800, PUCT constant: 1.5, learning rate: 0.05, batch size: 1024. |