ELF OpenGo: an analysis and open reimplementation of AlphaZero

Authors: Yuandong Tian, Jerry Ma, Qucheng Gong, Shubho Sengupta, Zhuoyuan Chen, James Pinkerton, Larry Zitnick

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply ELF Open Go to conduct extensive ablation studies, and to identify and analyze numerous interesting phenomena in both the model training and in the gameplay inference procedures.
Researcher Affiliation Industry 1Facebook AI Research, Menlo Park, California, USA.
Pseudocode No The paper describes algorithms but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes Our code, models, selfplay datasets, and auxiliary data are publicly available. 1Resources available at https://facebook.ai/ developers/tools/elf-opengo.
Open Datasets Yes a comprehensive training trajectory dataset featuring 20 million selfplay games over 1.5 million training minibatches, and auxiliary data. 2Auxiliary data comprises a test suite for difficult ladder game scenarios, comparative selfplay datasets, and performance validation match logs (both vs. humans and vs. other Go AIs). Resources available at https://facebook.ai/ developers/tools/elf-opengo.
Dataset Splits No The paper describes a model evaluation process ("evaluator receives proposed new models. It plays out 400 AI vs. AI games...") but does not specify a distinct 'validation dataset split' with percentages or sample counts for reproduction, separate from training or testing.
Hardware Specification Yes Both our training and inference use NVIDIA Tesla V100 GPUs with 16 GB of memory. Instead of 5,000 selfplay TPUs and 64 training TPUs, we use 2,000 selfplay GPUs and 8 training GPUs.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA x.x).
Experiment Setup Yes Since AZ s replay buffer size is unspecified in Silver et al. (2018), we use the AGZ setting of 500,000 games. We use the AGZ selfplay rollout setting of 1,600 per move. Finally, we use a cpuct constant of 1.5 and a virtual loss constant of 1.0;...Our main training run constructs a 256-filter, 20-block model (starting from random initialization). First, we run our ELF Open Go training system for 500,000 minibatches at learning rate 10 2. Subsequently, we stop and restart the training system twice (at learning rates 10 3, and 10 4), each time for an additional 500,000 training minibatches.