Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Deep Reinforcement Learning for General Game Playing
Authors: Adrian Goldwaser, Michael Thielscher1701-1708
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, experimental results of this agent are shown which clearly show that deep reinforcement learning within a GGP environment can perform noticeably better than a UCT benchmark agent in a number of games. |
| Researcher Affiliation | Academia | Adrian Goldwaser, Michael Thielscher Department of Computer Science, University of New South Wales EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Network initialisation and Algorithm 2: High-level training loop |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., specific repository link, explicit statement of code release, or mention of code in supplementary materials) for the methodology described. |
| Open Datasets | Yes | All hyperparameters were tuned on Connect-4 with a 6x8 board, then evaluated on the following games from past GGP competitions (Genesereth and Björnsson 2013): Connect-4 (6x7 board) ... Breakthrough (6x6 board) ... Babel ... Pacman 3p (6x6 board) |
| Dataset Splits | No | The paper describes tuning hyperparameters on Connect-4 and performing self-play, but it does not specify explicit training, validation, or test dataset splits (percentages or sample counts) for any given dataset. |
| Hardware Specification | Yes | Tests were run on an Intel Core i5 running at 2.9GHz and used a Ge Force GTX 780Ti graphics card for neural network operations. |
| Software Dependencies | No | The paper describes the implementation details of the neural network and algorithms, but it does not provide specific software dependency names with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | The choice of training on 10 mini-batches (of size 128) was made to make it so that it will train on around 5% of the data in the replay buffer (of max size 20,000) at each stage... This parameter was varied in the range r {0, 0.5, 1}... Both agents had a time limit of 2 seconds per move and each had their first 2 moves randomised. |