Deep Reinforcement Learning for General Game Playing

Authors: Adrian Goldwaser, Michael Thielscher1701-1708

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, experimental results of this agent are shown which clearly show that deep reinforcement learning within a GGP environment can perform noticeably better than a UCT benchmark agent in a number of games.
Researcher Affiliation Academia Adrian Goldwaser, Michael Thielscher Department of Computer Science, University of New South Wales adrian.goldwaser@gmail.com, mit@unsw.edu.au
Pseudocode Yes Algorithm 1: Network initialisation and Algorithm 2: High-level training loop
Open Source Code No The paper does not provide any concrete access information (e.g., specific repository link, explicit statement of code release, or mention of code in supplementary materials) for the methodology described.
Open Datasets Yes All hyperparameters were tuned on Connect-4 with a 6x8 board, then evaluated on the following games from past GGP competitions (Genesereth and Björnsson 2013): Connect-4 (6x7 board) ... Breakthrough (6x6 board) ... Babel ... Pacman 3p (6x6 board)
Dataset Splits No The paper describes tuning hyperparameters on Connect-4 and performing self-play, but it does not specify explicit training, validation, or test dataset splits (percentages or sample counts) for any given dataset.
Hardware Specification Yes Tests were run on an Intel Core i5 running at 2.9GHz and used a Ge Force GTX 780Ti graphics card for neural network operations.
Software Dependencies No The paper describes the implementation details of the neural network and algorithms, but it does not provide specific software dependency names with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup Yes The choice of training on 10 mini-batches (of size 128) was made to make it so that it will train on around 5% of the data in the replay buffer (of max size 20,000) at each stage... This parameter was varied in the range r {0, 0.5, 1}... Both agents had a time limit of 2 seconds per move and each had their first 2 moves randomised.