Accelerating Reinforcement Learning through GPU Atari Emulation
Authors: Steven Dalton, iuri frosio
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Cu LE generates up to 155M frames per hour on a single GPU, a finding previously achieved only through a cluster of CPUs. Beyond highlighting the differences between CPU and GPU emulators in the context of reinforcement learning, we show how to leverage the high throughput of Cu LE by effective batching of the training data, and show accelerated convergence for A2C+V-trace. |
| Researcher Affiliation | Industry | Steven Dalton , Iuri Frosio NVIDIA, USA {sdalton,ifrosio}@nvidia.com |
| Pseudocode | No | The paper describes the architecture and functionality of Cu LE in text, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Cu LE is available at https://github.com/NVlabs/cule. |
| Open Datasets | Yes | We focus our attention on the inference path and move from the traditional CPU implementation of the Atari Learning Environment (ALE), a set of Atari 2600 games that emerged as an excellent DRL benchmark [3, 11]. We show that significant performance bottlenecks stem from CPU-based environment emulation... Open AI Gym [15] |
| Dataset Splits | No | The paper mentions using Atari games and Open AI Gym environments but does not specify explicit training, validation, or test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | Table 2: Systems used for experiments. System I 12-core Core i7-5930K @3.50GHz Titan V; System II 6-core Core i7-8086K @5GHz Tesla V100; System III 20-core Core E5-2698 v4 @2.20GHz 2 Tesla V100 8, NVLink |
| Software Dependencies | No | The paper mentions 'Our Py Torch [16] implementation of A2C' but does not specify the version of PyTorch or any other software dependencies with version numbers. |
| Experiment Setup | Yes | in our experiments we use a vanilla A2C [15], with N-step bootstrapping, and N = 5 as the baseline; This configuration takes, on average, 21.2 minutes (and 5.5M training frames) to reach a score of 18 for Pong and 16.6 minutes (4.0M training frames) for a score of 1,500 on Ms-Pacman; Furthermore, as only the most recent data in a batch are generated with the current policy, we use V-trace [6] for off-policy correction. Extending the batch size in the temporal dimension (N-steps bootstrapping, N = 20) |