reproducibilityindex.ai

Thinking Fast and Slow with Deep Learning and Tree Search

Authors: Thomas Anthony, Zheng Tian, David Barber

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that EXIT outperforms REINFORCE for training a neural network to play the board game Hex, and our ﬁnal tree search agent, trained tabula rasa, defeats MOHEX 1.0, the most recent Olympiad Champion player to be publicly released.
Researcher Affiliation	Academia	Thomas Anthony1, Zheng Tian1, and David Barber1,2 1University College London 2Alan Turing Institute thomas.anthony.14@ucl.ac.uk
Pseudocode	Yes	Algorithm 1 Expert Iteration
Open Source Code	No	The paper does not provide an explicit statement about the release of its source code or a link to a code repository for the methodology described.
Open Datasets	No	The paper states, 'we create a set Si of game states by self play of the apprentice ˆπi 1' and 'Based on our initial dataset of 100,000 MCTS moves', indicating that the dataset was generated by the authors through self-play and not obtained from a publicly available source with access information.
Dataset Splits	No	The paper describes generating datasets through self-play and iterative training, but it does not specify explicit training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification	Yes	This machine has an Intel Xeon E5-1620 and n Vidia Titan X (Maxwell), our tree search takes 0.3 seconds for 10,000 iterations, while MOHEX takes 0.2 seconds for 10,000 iterations, with multithreading.
Software Dependencies	No	The paper mentions algorithms and optimizers (e.g., 'We use Adam [10] as our optimiser'), but it does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup	Yes	All our experiments are on a 9 9 board size. All MCTS agents use 10,000 simulations per move, unless stated otherwise. All use a uniform default policy. We also use RAVE. Full details are in the appendix. Tuning of hyperparameters found that wa = 100 was a good choice for this parameter, which is close to the average number of simulations per action at the root when using 10,000 iterations in the MCTS.