Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search
Authors: Li-Cheng Lan, Ti-Rong Wu, I-Chen Wu, Cho-Jui Hsieh259-267
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching. With our algorithm, called Dynamic Simulation MCTS (DS-MCTS), we can speed up a No Go agent trained by Alpha Zero 2.5 times faster while maintaining a similar winning rate, which is critical for training and conducting experiments. Also, under the same average simulation count, our method can achieve a 61% winning rate against the original program. |
| Researcher Affiliation | Academia | Li-Cheng Lan1, Ti-Rong Wu2, I-Chen Wu2,3, Cho-Jui Hsieh1 1 Department of Computer Science, UCLA, Los Angeles, USA 2 Department of Computer Science, National Chiao-Tung University, Taiwan 3 Research Center for IT Innovation, Academia Sinica, Taiwan |
| Pseudocode | Yes | Algorithm 1: Dynamic Simulation MCTS |
| Open Source Code | No | The paper cites 'Lan. 2016. Ha Ha No Go: An open source No Go program. https://github.com/lclan1024/Ha Ha No Go', which is used as an opponent or baseline, but there is no explicit statement or link provided for the code of the proposed DS-MCTS method. |
| Open Datasets | Yes | The No Go agent we used has a 98% win rate against Ha Ha No Go (Lan 2016)... For each game, instead of starting from the empty position, we randomly select an opening from a public 9x9 Go opening book (Coulom 2017) as the starting position. |
| Dataset Splits | No | The paper mentions 'trained by 20,000 self-play games' but does not specify the explicit percentages or counts for training, validation, and test splits needed for reproduction. |
| Hardware Specification | No | The paper states, 'The computing resource is partially supported by the national center for high-performance computing (NCHC),' but it does not provide specific details on CPU, GPU, or memory models used for the experiments. |
| Software Dependencies | No | The paper mentions 'Res Net' and refers to the 'Alpha Zero' algorithm for the agent, but it does not provide specific software dependencies with version numbers (e.g., deep learning frameworks, Python version, specific libraries) required to replicate the experiment. |
| Experiment Setup | Yes | State-UN and MCTS-UN have ten blocks and 196 filters and are trained by 20,000 self-play games with Nmax = 1600. MCTS-UN is trained by n = 160, where n is acquired by Equation 7. Each experiment played at least 2,048 games; therefore, the standard deviation was less than 1.1%. The default opponent is PV-MCTS with opp sim = 1600 simulations on each move. The default Nmax for DS-MCTS method is also 1600. The second checkpoint c[1] is selected by Equ. 7, and for other checkpoints, we use c[i] = 2 * c[i-1]. |