reproducibilityindex.ai

SpeedyZero: Mastering Atari with Limited Data and Time

Authors: Yixuan Mei, Jiaxuan Gao, Weirui Ye, Shaohuai Liu, Yang Gao, Yi Wu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We develop Speedy Zero, a distributed RL system built upon a state-of-the-art model-based RL method, Efficient Zero, with a dedicated system design for fast distributed computation. We evaluate Speedy Zero on the Atari 100k benchmark (Kaiser et al., 2019), Speedy Zero achieves human-level performance with only 35 minutes of training and 300k samples. Compared with Efficient Zero, which requires 8.5 hours of training, Speedy Zero retains a comparable sample efficiency while achieving a 14.5 speedup in wall-clock time.
Researcher Affiliation	Academia	Yixuan Mei1,2 , Jiaxuan Gao1,2 , Weirui Ye1, Shaohuai Liu1, Yang Gao1,2 , Yi Wu1,2 1 Institute for Interdisciplinary Information Sciences, Tsinghua University, 2 Shanghai Qi Zhi Institute
Pseudocode	No	The paper describes algorithms and presents mathematical formulas (e.g., for Clipped LARS) but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing open-source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We evaluate Speedy Zero on the Atari 100k benchmark (Kaiser et al., 2019), Speedy Zero achieves human-level performance with only 35 minutes of training and 300k samples. It contains 26 Atari games that are deemed solvable with a limited amount of samples.
Dataset Splits	No	The paper mentions using the Atari 100k benchmark which typically has predefined splits, but it does not explicitly provide details about the training, validation, and test splits (e.g., percentages, sample counts, or specific split files) used in their experiments.
Hardware Specification	Yes	For the 35min experiments, the trainer node and the data node are both machines with 8 A100 80G GPUs (with NV-Switch), 128 CPU cores, and 1TB of RAM. There are 9 reanalysis nodes, each of which contains 4 A100 80G GPUs (with NV-Switch), 64 CPU cores, and 512GB of RAM. For the 50min experiments, the trainer node and the data node both contain 8 A100 80G GPUs (without NV-Switch), 128 CPU cores, and 512GB of RAM and the 15 reanalysis nodes all contain 1 NVIDIA RTX 3090 GPUs, 128 CPU cores, and 512GB of RAM.
Software Dependencies	No	The paper mentions using "Distributed Data Parallel provided by Py Torch (Li et al., 2020)" but does not specify a version number for PyTorch or any other software dependency.
Experiment Setup	Yes	For the main results in Sec. 5.2 and ablation study in Sec. 5.3, the trainer node is configured with 8 DDP trainers and each DDP trainer receives batches with batch size 256 for training, indicating a total batch size of 2048. The model held by each Reanalyze workers is updated every 25 training steps. The models of the priority refreshers and actors are updated every 10 training steps. The total number of training steps is 15k. Additionally, Table 6 lists "Common hyper-parameters of Speedy Zero" including optimizer, max gradient norm, priority exponent, evaluation episodes, and various coefficients. Table 8 and 9 list hyperparameters for large batch size experiments and PPO respectively. Table 11 provides "Key system configuration of Speedy Zero" with detailed numbers of actors, refreshers, buffer capacities, queue capacities, etc.