Accelerating and Improving AlphaZero Using Population Based Training

Authors: Ti-Rong Wu, Ting-Han Wei, I-Chen Wu1046-1053

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments for 9x9 Go, the PBT method is able to achieve a higher win rate for 9x9 Go than the baselines, each with its own hyperparameter configuration and trained individually. For 19x19 Go, with PBT, we are able to obtain improvements in playing strength.
Researcher Affiliation Academia 1Department of Computer Science, National Chiao Tung University, Taiwan 2Pervasive Artificial Intelligence Research (PAIR) Labs, Taiwan 3Department of Computing Science, University of Alberta, Edmonton, Canada
Pseudocode No The paper describes the methods in prose but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper mentions other open-source projects (ELF Open Go, Leela Zero, Open Spiel) but does not provide a link or statement about releasing its own source code for the described methodology.
Open Datasets No The paper describes generating data via self-play ('In each iteration, 10,000 games are generated via self-play') rather than using a pre-existing, publicly available dataset with concrete access information.
Dataset Splits Yes In the evaluation phase, the new checkpoint is evaluated against the current network. If the checkpoint is superior to the current network, namely, if it surpasses the current network by a win rate of 55% and above, it replaces the current network. ... Therefore, during evaluation, we use a round-robin tournament for these 16 agents, where each agent plays 6 games against every other agent, with alternating roles as Black and White for fair comparisons.
Hardware Specification No The paper mentions hardware used by other research groups (e.g., DeepMind, Facebook AI Research, '2000 GPUs', '5000 TPUs') but does not specify the hardware used for its own experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) required to replicate the experiment.
Experiment Setup Yes For the 9x9 Go experiments, the network architecture consists of 3 residual blocks with 64 filters. In our experiment, for each baseline... we run a total of 200 iterations, where each iteration contains a selfplay phase with 5000 games and an optimization phase. ... Table 1: The hyperparameters for the 8 baselines, all of which are based on Alpha Zero training, and the initial values for the PBT version. ... In each iteration, 10,000 games are generated via self-play.