Accelerating and Improving AlphaZero Using Population Based Training
Authors: Ti-Rong Wu, Ting-Han Wei, I-Chen Wu1046-1053
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments for 9x9 Go, the PBT method is able to achieve a higher win rate for 9x9 Go than the baselines, each with its own hyperparameter configuration and trained individually. For 19x19 Go, with PBT, we are able to obtain improvements in playing strength. |
| Researcher Affiliation | Academia | 1Department of Computer Science, National Chiao Tung University, Taiwan 2Pervasive Artificial Intelligence Research (PAIR) Labs, Taiwan 3Department of Computing Science, University of Alberta, Edmonton, Canada |
| Pseudocode | No | The paper describes the methods in prose but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions other open-source projects (ELF Open Go, Leela Zero, Open Spiel) but does not provide a link or statement about releasing its own source code for the described methodology. |
| Open Datasets | No | The paper describes generating data via self-play ('In each iteration, 10,000 games are generated via self-play') rather than using a pre-existing, publicly available dataset with concrete access information. |
| Dataset Splits | Yes | In the evaluation phase, the new checkpoint is evaluated against the current network. If the checkpoint is superior to the current network, namely, if it surpasses the current network by a win rate of 55% and above, it replaces the current network. ... Therefore, during evaluation, we use a round-robin tournament for these 16 agents, where each agent plays 6 games against every other agent, with alternating roles as Black and White for fair comparisons. |
| Hardware Specification | No | The paper mentions hardware used by other research groups (e.g., DeepMind, Facebook AI Research, '2000 GPUs', '5000 TPUs') but does not specify the hardware used for its own experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) required to replicate the experiment. |
| Experiment Setup | Yes | For the 9x9 Go experiments, the network architecture consists of 3 residual blocks with 64 filters. In our experiment, for each baseline... we run a total of 200 iterations, where each iteration contains a selfplay phase with 5000 games and an optimization phase. ... Table 1: The hyperparameters for the 8 baselines, all of which are based on Alpha Zero training, and the initial values for the PBT version. ... In each iteration, 10,000 games are generated via self-play. |