SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference

Authors: Lasse Espeholt, Raphaël Marinier, Piotr Stanczyk, Ke Wang, Marcin Michalski‎

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental SEED adopts two state of the art distributed algorithms, IMPALA/V-trace (policy gradients) and R2D2 (Q-learning), and is evaluated on Atari-57, Deep Mind Lab and Google Research Football.
Researcher Affiliation Industry Brain Team Google Research {lespeholt, raphaelm, stanczyk, kewa, michalski}@google.com
Pseudocode No The paper includes architectural diagrams but no explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The implementation along with experiments is open-sourced so results can be reproduced and novel ideas tried out. Github: http://github.com/google-research/seed_rl.
Open Datasets Yes We evaluate SEED on a number of environments: Deep Mind Lab (Beattie et al., 2016), Google Research Football (Kurach et al., 2019) and Arcade Learning Environment (Bellemare et al., 2013).
Dataset Splits No The paper describes hyperparameter tuning over sets of hyperparameters and evaluation procedures, but it does not specify a training/validation/test dataset split in the conventional sense for static datasets, which is typical for reinforcement learning environments.
Hardware Specification Yes For evaluating performance, we compare IMPALA using an Nvidia P100 with SEED with multiple accelerator setups.
Software Dependencies Yes For optimal performance, we use TPUs (cloud.google.com/tpu/) and Tensor Flow 2 (Abadi et al., 2015) to simplify the implementation.
Experiment Setup Yes The same set of 24 hyperparameter sets and the same model (Res Net from IMPALA) was used for both agents. More details can be found in Appendix A.1.2.