reproducibilityindex.ai

Accelerating Reinforcement Learning through GPU Atari Emulation

Authors: Steven Dalton, iuri frosio

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Cu LE generates up to 155M frames per hour on a single GPU, a ﬁnding previously achieved only through a cluster of CPUs. Beyond highlighting the differences between CPU and GPU emulators in the context of reinforcement learning, we show how to leverage the high throughput of Cu LE by effective batching of the training data, and show accelerated convergence for A2C+V-trace.
Researcher Affiliation	Industry	Steven Dalton , Iuri Frosio NVIDIA, USA {sdalton,ifrosio}@nvidia.com
Pseudocode	No	The paper describes the architecture and functionality of Cu LE in text, but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Cu LE is available at https://github.com/NVlabs/cule.
Open Datasets	Yes	We focus our attention on the inference path and move from the traditional CPU implementation of the Atari Learning Environment (ALE), a set of Atari 2600 games that emerged as an excellent DRL benchmark [3, 11]. We show that signiﬁcant performance bottlenecks stem from CPU-based environment emulation... Open AI Gym [15]
Dataset Splits	No	The paper mentions using Atari games and Open AI Gym environments but does not specify explicit training, validation, or test dataset splits with percentages or sample counts.
Hardware Specification	Yes	Table 2: Systems used for experiments. System I 12-core Core i7-5930K @3.50GHz Titan V; System II 6-core Core i7-8086K @5GHz Tesla V100; System III 20-core Core E5-2698 v4 @2.20GHz 2 Tesla V100 8, NVLink
Software Dependencies	No	The paper mentions 'Our Py Torch [16] implementation of A2C' but does not specify the version of PyTorch or any other software dependencies with version numbers.
Experiment Setup	Yes	in our experiments we use a vanilla A2C [15], with N-step bootstrapping, and N = 5 as the baseline; This conﬁguration takes, on average, 21.2 minutes (and 5.5M training frames) to reach a score of 18 for Pong and 16.6 minutes (4.0M training frames) for a score of 1,500 on Ms-Pacman; Furthermore, as only the most recent data in a batch are generated with the current policy, we use V-trace [6] for off-policy correction. Extending the batch size in the temporal dimension (N-steps bootstrapping, N = 20)