Three-Head Neural Network Architecture for Monte Carlo Tree Search
Authors: Chao Gao, Martin Müller, Ryan Hayward
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present experimental results on 13 13 Hex, the largest board size that has been adopted in computer program competitions. |
| Researcher Affiliation | Academia | Chao Gao, Martin M uller, Ryan Hayward University of Alberta {cgao3, mmueller, hayward}@ualberta.ca |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The trained neural net models of Mo Hex-CNN and dataset are publicly available https://drive.google.com/drive/folders/ 18Mdnv MIt U7O2s EJDlbmk Zz Uh ZG7y DK9. |
| Open Datasets | Yes | Dataset We use the publicly available training dataset of Mo Hex CNN [Gao et al., 2017], generated from Mo Hex 2.0 selfplay1, containing about 106 distinct state-action-value examples. Each game is an alternating sequence of black and white moves, along with game result. 1The trained neural net models of Mo Hex-CNN and dataset are publicly available https://drive.google.com/drive/folders/ 18Mdnv MIt U7O2s EJDlbmk Zz Uh ZG7y DK9. |
| Dataset Splits | No | As in [Gao et al., 2017], the dataset is partitioned into training and testing sets, where examples from testing set do not appear in the training set. |
| Hardware Specification | Yes | We execute experiments on the same Intel i7-6700 CPU computer with a single GTX 1080 GPU and 32 GB RAM. |
| Software Dependencies | No | The neural nets are implemented with Tensorflow, trained by Adam optimizer [Kingma and Ba, 2014] using default learning rate with mini-batch size of 128 for 100 epochs. |
| Experiment Setup | Yes | The neural nets are implemented with Tensorflow, trained by Adam optimizer [Kingma and Ba, 2014] using default learning rate with mini-batch size of 128 for 100 epochs. For loss function (5), we set L2 regularization constant c to 10 5, and value loss weight w to 0.01. |