reproducibilityindex.ai

Deep Reinforcement Learning for General Game Playing

Authors: Adrian Goldwaser, Michael Thielscher1701-1708

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, experimental results of this agent are shown which clearly show that deep reinforcement learning within a GGP environment can perform noticeably better than a UCT benchmark agent in a number of games.
Researcher Affiliation	Academia	Adrian Goldwaser, Michael Thielscher Department of Computer Science, University of New South Wales adrian.goldwaser@gmail.com, mit@unsw.edu.au
Pseudocode	Yes	Algorithm 1: Network initialisation and Algorithm 2: High-level training loop
Open Source Code	No	The paper does not provide any concrete access information (e.g., specific repository link, explicit statement of code release, or mention of code in supplementary materials) for the methodology described.
Open Datasets	Yes	All hyperparameters were tuned on Connect-4 with a 6x8 board, then evaluated on the following games from past GGP competitions (Genesereth and Björnsson 2013): Connect-4 (6x7 board) ... Breakthrough (6x6 board) ... Babel ... Pacman 3p (6x6 board)
Dataset Splits	No	The paper describes tuning hyperparameters on Connect-4 and performing self-play, but it does not specify explicit training, validation, or test dataset splits (percentages or sample counts) for any given dataset.
Hardware Specification	Yes	Tests were run on an Intel Core i5 running at 2.9GHz and used a Ge Force GTX 780Ti graphics card for neural network operations.
Software Dependencies	No	The paper describes the implementation details of the neural network and algorithms, but it does not provide specific software dependency names with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	The choice of training on 10 mini-batches (of size 128) was made to make it so that it will train on around 5% of the data in the replay buffer (of max size 20,000) at each stage... This parameter was varied in the range r {0, 0.5, 1}... Both agents had a time limit of 2 seconds per move and each had their ﬁrst 2 moves randomised.