Deep Reinforcement Learning for General Game Playing
Authors: Adrian Goldwaser, Michael Thielscher1701-1708
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, experimental results of this agent are shown which clearly show that deep reinforcement learning within a GGP environment can perform noticeably better than a UCT benchmark agent in a number of games. |
| Researcher Affiliation | Academia | Adrian Goldwaser, Michael Thielscher Department of Computer Science, University of New South Wales adrian.goldwaser@gmail.com, mit@unsw.edu.au |
| Pseudocode | Yes | Algorithm 1: Network initialisation and Algorithm 2: High-level training loop |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., specific repository link, explicit statement of code release, or mention of code in supplementary materials) for the methodology described. |
| Open Datasets | Yes | All hyperparameters were tuned on Connect-4 with a 6x8 board, then evaluated on the following games from past GGP competitions (Genesereth and Björnsson 2013): Connect-4 (6x7 board) ... Breakthrough (6x6 board) ... Babel ... Pacman 3p (6x6 board) |
| Dataset Splits | No | The paper describes tuning hyperparameters on Connect-4 and performing self-play, but it does not specify explicit training, validation, or test dataset splits (percentages or sample counts) for any given dataset. |
| Hardware Specification | Yes | Tests were run on an Intel Core i5 running at 2.9GHz and used a Ge Force GTX 780Ti graphics card for neural network operations. |
| Software Dependencies | No | The paper describes the implementation details of the neural network and algorithms, but it does not provide specific software dependency names with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | The choice of training on 10 mini-batches (of size 128) was made to make it so that it will train on around 5% of the data in the replay buffer (of max size 20,000) at each stage... This parameter was varied in the range r {0, 0.5, 1}... Both agents had a time limit of 2 seconds per move and each had their first 2 moves randomised. |