reproducibilityindex.ai

ELF OpenGo: an analysis and open reimplementation of AlphaZero

Authors: Yuandong Tian, Jerry Ma, Qucheng Gong, Shubho Sengupta, Zhuoyuan Chen, James Pinkerton, Larry Zitnick

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply ELF Open Go to conduct extensive ablation studies, and to identify and analyze numerous interesting phenomena in both the model training and in the gameplay inference procedures.
Researcher Affiliation	Industry	1Facebook AI Research, Menlo Park, California, USA.
Pseudocode	No	The paper describes algorithms but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code, models, selfplay datasets, and auxiliary data are publicly available. 1Resources available at https://facebook.ai/ developers/tools/elf-opengo.
Open Datasets	Yes	a comprehensive training trajectory dataset featuring 20 million selfplay games over 1.5 million training minibatches, and auxiliary data. 2Auxiliary data comprises a test suite for difﬁcult ladder game scenarios, comparative selfplay datasets, and performance validation match logs (both vs. humans and vs. other Go AIs). Resources available at https://facebook.ai/ developers/tools/elf-opengo.
Dataset Splits	No	The paper describes a model evaluation process ("evaluator receives proposed new models. It plays out 400 AI vs. AI games...") but does not specify a distinct 'validation dataset split' with percentages or sample counts for reproduction, separate from training or testing.
Hardware Specification	Yes	Both our training and inference use NVIDIA Tesla V100 GPUs with 16 GB of memory. Instead of 5,000 selfplay TPUs and 64 training TPUs, we use 2,000 selfplay GPUs and 8 training GPUs.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA x.x).
Experiment Setup	Yes	Since AZ s replay buffer size is unspeciﬁed in Silver et al. (2018), we use the AGZ setting of 500,000 games. We use the AGZ selfplay rollout setting of 1,600 per move. Finally, we use a cpuct constant of 1.5 and a virtual loss constant of 1.0;...Our main training run constructs a 256-ﬁlter, 20-block model (starting from random initialization). First, we run our ELF Open Go training system for 500,000 minibatches at learning rate 10 2. Subsequently, we stop and restart the training system twice (at learning rates 10 3, and 10 4), each time for an additional 500,000 training minibatches.