reproducibilityindex.ai

SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference

Authors: Lasse Espeholt, Raphaël Marinier, Piotr Stanczyk, Ke Wang, Marcin Michalski‎

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	SEED adopts two state of the art distributed algorithms, IMPALA/V-trace (policy gradients) and R2D2 (Q-learning), and is evaluated on Atari-57, Deep Mind Lab and Google Research Football.
Researcher Affiliation	Industry	Brain Team Google Research {lespeholt, raphaelm, stanczyk, kewa, michalski}@google.com
Pseudocode	No	The paper includes architectural diagrams but no explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The implementation along with experiments is open-sourced so results can be reproduced and novel ideas tried out. Github: http://github.com/google-research/seed_rl.
Open Datasets	Yes	We evaluate SEED on a number of environments: Deep Mind Lab (Beattie et al., 2016), Google Research Football (Kurach et al., 2019) and Arcade Learning Environment (Bellemare et al., 2013).
Dataset Splits	No	The paper describes hyperparameter tuning over sets of hyperparameters and evaluation procedures, but it does not specify a training/validation/test dataset split in the conventional sense for static datasets, which is typical for reinforcement learning environments.
Hardware Specification	Yes	For evaluating performance, we compare IMPALA using an Nvidia P100 with SEED with multiple accelerator setups.
Software Dependencies	Yes	For optimal performance, we use TPUs (cloud.google.com/tpu/) and Tensor Flow 2 (Abadi et al., 2015) to simplify the implementation.
Experiment Setup	Yes	The same set of 24 hyperparameter sets and the same model (Res Net from IMPALA) was used for both agents. More details can be found in Appendix A.1.2.