SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
Authors: Lasse Espeholt, Raphaël Marinier, Piotr Stanczyk, Ke Wang, Marcin Michalski
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | SEED adopts two state of the art distributed algorithms, IMPALA/V-trace (policy gradients) and R2D2 (Q-learning), and is evaluated on Atari-57, Deep Mind Lab and Google Research Football. |
| Researcher Affiliation | Industry | Brain Team Google Research {lespeholt, raphaelm, stanczyk, kewa, michalski}@google.com |
| Pseudocode | No | The paper includes architectural diagrams but no explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The implementation along with experiments is open-sourced so results can be reproduced and novel ideas tried out. Github: http://github.com/google-research/seed_rl. |
| Open Datasets | Yes | We evaluate SEED on a number of environments: Deep Mind Lab (Beattie et al., 2016), Google Research Football (Kurach et al., 2019) and Arcade Learning Environment (Bellemare et al., 2013). |
| Dataset Splits | No | The paper describes hyperparameter tuning over sets of hyperparameters and evaluation procedures, but it does not specify a training/validation/test dataset split in the conventional sense for static datasets, which is typical for reinforcement learning environments. |
| Hardware Specification | Yes | For evaluating performance, we compare IMPALA using an Nvidia P100 with SEED with multiple accelerator setups. |
| Software Dependencies | Yes | For optimal performance, we use TPUs (cloud.google.com/tpu/) and Tensor Flow 2 (Abadi et al., 2015) to simplify the implementation. |
| Experiment Setup | Yes | The same set of 24 hyperparameter sets and the same model (Res Net from IMPALA) was used for both agents. More details can be found in Appendix A.1.2. |