Thinking Fast and Slow with Deep Learning and Tree Search
Authors: Thomas Anthony, Zheng Tian, David Barber
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that EXIT outperforms REINFORCE for training a neural network to play the board game Hex, and our final tree search agent, trained tabula rasa, defeats MOHEX 1.0, the most recent Olympiad Champion player to be publicly released. |
| Researcher Affiliation | Academia | Thomas Anthony1, Zheng Tian1, and David Barber1,2 1University College London 2Alan Turing Institute thomas.anthony.14@ucl.ac.uk |
| Pseudocode | Yes | Algorithm 1 Expert Iteration |
| Open Source Code | No | The paper does not provide an explicit statement about the release of its source code or a link to a code repository for the methodology described. |
| Open Datasets | No | The paper states, 'we create a set Si of game states by self play of the apprentice ˆπi 1' and 'Based on our initial dataset of 100,000 MCTS moves', indicating that the dataset was generated by the authors through self-play and not obtained from a publicly available source with access information. |
| Dataset Splits | No | The paper describes generating datasets through self-play and iterative training, but it does not specify explicit training, validation, and test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | This machine has an Intel Xeon E5-1620 and n Vidia Titan X (Maxwell), our tree search takes 0.3 seconds for 10,000 iterations, while MOHEX takes 0.2 seconds for 10,000 iterations, with multithreading. |
| Software Dependencies | No | The paper mentions algorithms and optimizers (e.g., 'We use Adam [10] as our optimiser'), but it does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | All our experiments are on a 9 9 board size. All MCTS agents use 10,000 simulations per move, unless stated otherwise. All use a uniform default policy. We also use RAVE. Full details are in the appendix. Tuning of hyperparameters found that wa = 100 was a good choice for this parameter, which is close to the average number of simulations per action at the root when using 10,000 iterations in the MCTS. |