High-Throughput Synchronous Deep RL

Authors: Iou-Jen Liu, Raymond Yeh, Alexander Schwing

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on Atari games and the Google Research Football environment. Compared to synchronous baselines, HTS-RL is 2 6 faster. Compared to state-of-the-art asynchronous methods, HTS-RL has competitive throughput and consistently achieves higher average episode rewards.
Researcher Affiliation Academia Iou-Jen Liu, Raymond A. Yeh, Alexander G. Schwing University of Illinois at Urbana-Champaign {iliu3, yeh17, aschwing}@illinois.edu
Pseudocode No The paper includes diagrams of system structures (Figure 1) and timelines (Figure 2) but no structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Iou Jen Liu/HTS-RL.
Open Datasets Yes We evaluate the proposed approach on a subset of the Atari games [2, 3] and all 11 academy scenarios of the recently introduced Google Research Football (GFootball) [15] environment.
Dataset Splits No The paper mentions evaluating using 'average over the last 100 evaluation episodes' but does not specify training, validation, or test dataset splits with percentages or counts in the main text.
Hardware Specification No The paper mentions using 'a single machine with 4 GPUs' but does not specify the exact GPU models, CPU models, or other detailed hardware specifications used for the experiments.
Software Dependencies No The paper mentions using 'Torch Beast implementation' for IMPALA and 'Pytorch implementation of Kostrikov' for A2C and PPO, but it does not provide specific version numbers for these software components or any other libraries.
Experiment Setup Yes Following Kostrikov [14], all methods are trained for 20M environment steps in the Atari environment. For GFootball, following Kurach et al. [15], we use 5M steps. ...Please see the supplementary material for details on hyperparameter settings and model architectures.