Better Computer Go Player with Neural Network and Long-term Prediction
Authors: Yuandong Tian, Yan Zhu
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extend this idea in our bot named darkforest, which relies on a DCNN designed for long-term predictions. Darkforest substantially improves the win rate for patternmatching approaches against MCTS-based approaches, even with looser search budgets. Against human players, the newest versions, darkfores2, achieve a stable 3d level on KGS Go Server as a ranked bot... In this paper, we show that DCNN-based move predictions indeed give a strong Go AI, if properly trained. In particular, we carefully design the training process and choose to predict next k moves rather than the immediate next move to enrich the gradient signal. Despite our prediction giving a mere 2% boost for accuracy of move predictions, the win rate against open-source engines (e.g., Pachi and Fuego) in heavy search scenarios (e.g., 100k rollouts) is more than 6 times higher... For evaluation, our model competes with Gnu Go, Pachi [Baudis & Gailly (2012)] and Fuego [Enzenberger et al. (2010)]. We use Gnu Go 3.8 level 10, Pachi 11.99 (Genjo-devel) with the pattern files, and Fuego 1.1 throughout our experiments. |
| Researcher Affiliation | Collaboration | Yuandong Tian Facebook AI Research Menlo Park, CA 94025 yuandong@fb.com Yan Zhu Rutgers University Facebook AI Research yz328@cs.rutgers.edu |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link to their own open-source code for the methodology described. |
| Open Datasets | Yes | We use the public KGS dataset (~170k games), which is used in Maddison et al. (2015). We also use Go Go D dataset1 (~80k games), which is also used in Clark & Storkey (2015). We used Go Go D 2015 summer version, purchased from http://www.gogod.co.uk. |
| Dataset Splits | No | The paper states 'We use all games before 2012 as the training set and 2013-2015 games as the test set. This leads to 144,748 games for training and 26,814 games for testing.' It does not specify a validation split. |
| Hardware Specification | Yes | We just use vanilla SGD on 4 NVidia K40m GPUs in a single machine to train the entire network (for some models we use 3 GPUs with 255 as the batch size). Our basic implementation of MCTS gives 16k rollouts per second (for 16 threads on a machine with Intel Xeon CPU E5-2680 v2 at 2.80GHz)... The distributed version, named darkfmcts3 in KGS Go Server, use darkfores2 as the underlying DCNN model, runs 75, 000 rollouts on 2048 threads and produces a move every 13 seconds with one Intel Xeon E5-2680 v2 at 2.80GHz and 44 NVidia K40m GPUs. |
| Software Dependencies | Yes | We use Gnu Go 3.8 level 10, Pachi 11.99 (Genjo-devel) with the pattern files, and Fuego 1.1 throughout our experiments. |
| Experiment Setup | Yes | The batch size is 256. We use data augmentation with rotation at 90-degree intervals and horizontal/vertical flipping. ...The learning rate is initially 0.05 and then divided by 5 when convergence stalls. ...Tree policy: Moves are first sorted by DCNN confidences, and then picked in order until the accumulated probability exceeds 0.8, or the maximum number of top moves are reached. Then we use UCT [Browne et al. (2012)] to select moves for tree expansion. ...Noise uniformly distributed in [0, σ] is added to the win rate... (σ = 0.05 thoroughout the experiments). ...The distributed version, named darkfmcts3 in KGS Go Server... uses top-3 predictions in the first 140 moves and switched to top-5 afterwards... Dynamic komi is used only for high handicap games (H5). |