High-Throughput Synchronous Deep RL
Authors: Iou-Jen Liu, Raymond Yeh, Alexander Schwing
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on Atari games and the Google Research Football environment. Compared to synchronous baselines, HTS-RL is 2 6 faster. Compared to state-of-the-art asynchronous methods, HTS-RL has competitive throughput and consistently achieves higher average episode rewards. |
| Researcher Affiliation | Academia | Iou-Jen Liu, Raymond A. Yeh, Alexander G. Schwing University of Illinois at Urbana-Champaign {iliu3, yeh17, aschwing}@illinois.edu |
| Pseudocode | No | The paper includes diagrams of system structures (Figure 1) and timelines (Figure 2) but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Iou Jen Liu/HTS-RL. |
| Open Datasets | Yes | We evaluate the proposed approach on a subset of the Atari games [2, 3] and all 11 academy scenarios of the recently introduced Google Research Football (GFootball) [15] environment. |
| Dataset Splits | No | The paper mentions evaluating using 'average over the last 100 evaluation episodes' but does not specify training, validation, or test dataset splits with percentages or counts in the main text. |
| Hardware Specification | No | The paper mentions using 'a single machine with 4 GPUs' but does not specify the exact GPU models, CPU models, or other detailed hardware specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using 'Torch Beast implementation' for IMPALA and 'Pytorch implementation of Kostrikov' for A2C and PPO, but it does not provide specific version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | Following Kostrikov [14], all methods are trained for 20M environment steps in the Atari environment. For GFootball, following Kurach et al. [15], we use 5M steps. ...Please see the supplementary material for details on hyperparameter settings and model architectures. |